llms.py

Latest Features

Latest features and updates in llms.py

Get the latest features by updating to the latest version:

pip install llms-py --upgrade

After upgrading, it's recommended to also upgrade any external extensions:

llms --update all

Mar 3, 2026

Support for Fireworks Large Language Models

Added support for Fireworks AI as a new provider, a fast inference platform hosting the leading open-source models including GLM 5, Kimi K2.5, MiniMax M2.5 and DeepSeek V3.2 at a market-leading 200 tok/s.

All text models support reasoning and tool use, making Fireworks an excellent choice for agentic workflows where speed matters. Here's Kimi K2.5 via Fireworks creating a retro Tetris game with full tool calling support:

Kimi K2.5 via Fireworks

Kimi K2.5 via Fireworks

Click to view full size

Results at 125 tk/s

Results at 125 tk/s

Click to view full size

Fireworks Image Generation Models

Fireworks also hosts Black Forest Labs' FLUX.1 image generation models with fast inference and competitive per-image pricing:

ModelCostPricing
FLUX.1 Kontext Max$0.08per image
FLUX.1 Kontext Pro$0.04per image
FLUX.1 Dev FP8$0.0005per step
FLUX.1 Schnell FP8$0.00035per step

This can be found in the model selector by selecting the Image output filter:

Here are examples to demonstrate the quality of each of the FLUX.1 models:

FLUX.1 Kontext Max

FLUX.1 Kontext Max

Click to view full size

FLUX.1 Kontext Pro

FLUX.1 Kontext Pro

Click to view full size

FLUX.1 Dev

FLUX.1 Dev

Click to view full size

FLUX.1 Schnell

FLUX.1 Schnell

Click to view full size

To get started, set your Fireworks API key:

export FIREWORKS_API_KEY=your_api_key_here

Then reset your providers configuration to pick up the new Fireworks models in providers-extra.json:

llms --reset providers

Mar 2, 2026

Credentials Auth Provider

The built-in credentials extension enables Username/Password authentication for your Application, including a Sign In page, user registration, role assignment, and account locking. It provides full user management through both the CLI and a web-based Admin UI, along with account self-service for all authenticated users.

Credentials is the default Auth Provider that's automatically enabled when at least one user has been created:

llms --adduser admin

After logging in as admin, you can create additional users from the Manage Users page which can be accessed from the user menu.

Manage Users

Manage Users

Click to view full size

Create User

Create User

Click to view full size

Change Password

Change Password

Click to view full size

Lock User

Lock User

Click to view full size

Delete User

Delete User

Click to view full size

My Account

My Account

Click to view full size

See the Credentials Auth docs for more details.

Feb 27, 2026

Nano Banana 2

Added Google's latest Gemini Nano Banana 2 model, available from both the Google and OpenRouter providers. The Google/Gemini provider supports multiple chat history in conversations, while OpenRouter sends a fresh chat request for each message.

You can find it in the model selector by selecting Image output filter and searching for "Nano Banana 2".

To get the latest model info with Nano Banana 2, reset your providers configuration:

llms --reset providers

Gallery pages now use optimized thumbnails instead of full-size images. Detail images that increasingly approach over 2MB are now served as thumbnails under 10KB, delivering a noticeable performance improvement when scrolling through the infinite scrolling gallery pages.

Reset to latest configuration

New versions sometimes include changes to llms.json config which isn't automatically updated.

Use the --reset option to reset the default configuration files back to its factory defaults. Available reset options (config, providers, all):

config - Reset ~/.llms/llms.json to default

llms --reset config

providers - Reset ~/.llms/providers.json and ~/.llms/providers-extra.json

llms --reset providers

Feb 25, 2026

New Theme Support

llms.py now ships with 8 built-in themes - 🎨 4 dark and 🎨 4 light - so you can personalize the Web UI to match your style. Switch themes instantly from the home page or Settings or create your own custom themes with full control over colors, typography, and background assets.

Nord themeMatrix themeBlue Smoke themeDark themeLight Slate themeSoft Pink themeLight Sky themeLight theme
Nord

See the Themes docs for details on the available themes and how to create your own.

Feb 19, 2026

Agent Browser Editor

The browser extension provides an integrated environment for creating, editing, and running automated browser scripts powered by Vercel's agent-browser.

  • 🖥️ Live Browser Preview - Clickable real-time screenshot with full mouse, keyboard, and scroll interaction
  • 📋 Element Inspector - Auto-refreshing snapshot giving scripts and AI a precise map of interactive elements
  • ✍️ AI Script Generation - Describe what you want in English and the AI generates the full automation script
  • 🤖 AI-Assisted Editing - Select lines in the editor and describe changes to iterate on scripts incrementally
  • ▶️ Run Selected Text - Run highlighted portions of a script to test individual steps in isolation
  • 💾 Saved Scripts Library - Build a library of reusable browser automations accessible from the sidebar

See browser docs for more details.

Feb 15, 2026

Standard Input

llms now accepts OpenAI-compatible Chat Completion requests via standard input, making it easy to integrate into shell pipelines and scripts.

When JSON is piped in, llms detects it automatically - no extra flags needed:

cat request.json | llms

Build requests inline with a heredoc:

llms <<EOF
{
  "model": "Minimax M2.5",
  "messages": [
    { "role": "user", "content": "Capital of France?" }
  ]
}
EOF

Combine with other CLI tools to generate requests dynamically:

echo '{"messages":[{"role":"user","content":"Summarize:'"$(cat notes.txt)"'"}]}' | llms

This pairs well with structured outputs support and jq to build end-to-end JSON pipelines:

(llms <<EOF
{
    "model": "moonshotai/kimi-k2-instruct",
    "messages": [{"role":"user", "content":"Return capital cities for: France, Italy, Spain, Japan." }],
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "country_capitals",
            "schema": {
                "type": "object",
                "properties": {
                    "capitals": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "country": { "type": "string" },
                                "capital": { "type": "string" }
                            },
                            "required": ["country","capital"]
                        }
                    }
                },
                "required": ["capitals"]
            }
        }
    }
}
EOF
) | jq -r '.capitals[] | "\(.country): \(.capital)"'

Output:

France: Paris
Italy: Rome
Spain: Madrid
Japan: Tokyo

Persistence Options

By default, all chat completions are saved to the database, including both the chat thread (conversation history) and the individual API request logs. Use these options to control what gets saved to the database:

--nohistory

Skip saving the chat thread (conversation history) to the database. The individual API request log is still recorded.

llms "What is the capital of France?" --nohistory

--nostore

Do not save anything to the database - no request log and no chat thread history. Implies --nohistory.

llms "What is the capital of France?" --nostore

Feb 9, 2026

Custom User and Agent Avatars

Personalize your chat experience with custom avatars for both yourself and AI agents. Upload images via the Settings page or manually add them to ~/.llms/users/ - supports .png, .svg, and auto-converts from other formats.

User Avatar

User Avatar

Click to view full size

Agent Avatar

Agent Avatar

Click to view full size

Compact Tool Calls with Smart Summarization

The new Compact Tool Calls feature automatically summarizes long tool arguments and outputs in the Chat UI to keep conversations concise while still providing access to important information as needed.

Tools Expanded

Tools Expanded

Click to view full size

Tools Collapsed

Tools Collapsed

Click to view full size

Previously, even with long Tool Call Arguments minimized, you could still only see a few on a page. Now that they're collapsed by default, you can see more at a glance and expand only the ones you need.

Feb 8, 2026

Support for Voice Input

Added Voice Input extension with speech-to-text transcription via a microphone button or ALT+D shortcut, supporting three modes: local transcription with voxtype, custom transcribe executable, and cloud-based voxtral-mini-latest via Mistral.

  • Added tok/s metrics in Chat UI on a per-message and per-thread basis

Feb 5, 2026

Voxtral Audio Models

Added support for Mistral's Voxtral audio transcription models - use the audio input filter in the model selector to find them.

Both the Chat Completion and dedicated Audio Transcription APIs deliver impressive speed, with the dedicated transcription endpoint returning results near-instantly.

Voxtral Chat

Voxtral Chat

Click to view full size

Voxtral Audio Transcription

Voxtral Audio Transcription

Click to view full size

Compact Threads

Added Compact Threads feature for managing long conversations - it summarizes the current thread into a new, condensed thread targeting 30% of the original context size. The compact button appears when a conversation exceeds 10 messages or uses more than 40% of the model's context limit.

Compact Button

Compact Button

Click to view full size

Compact Button Intensity

Compact Button Intensity

Click to view full size

The compaction model and prompts are fully customizable in ~/.llms/llms.json.

  • Fix OpenRouter provider after models.dev switched to use @openrouter/ai-sdk-provider. Remove llms.json to reset to default configuration:
rm ~/.llms/llms.json

Feb 3, 2026

  • Removed duplicate filesystem tools from Core Tools, they're now only included in File System Tools

  • Add sort_by and max_result options in search_files and made path and optional parameter to improve utility and reduce tool use error rates. path now defaults to the first allowed directory (project dir).

Feb 3, 2026

  • Add support for overridable ClientTimeout limits in ~/.llms/llms.json:
{
    "limits": {
        "client_timeout": 120
    }
}
  • Show proceed button for assistant messages without content but with reasoning

Feb 2, 2026

Multi User Skills

When Auth is enabled, each user manages their own skill collection at ~/.llms/user/<user>/skills and can enable or disable skills independently. Shared global & project-level skills remain accessible but read-only.

Jan 31, 2026

Jan 30, 2026

  • Support for tool calling for models returned by local Ollama instances

  • New openai-local provider for custom OpenAI-compatible endpoints

  • Fix computer tool issues in Docker by only loading computer tool if run in environment with a display

Jan 29, 2026

Skills Management

Added a full Skills Management UI for creating, editing, and deleting skills directly from the browser.

Skills package domain-specific instructions, scripts, references & assets that enhance your AI agent.

Browse & Install Skills

Added a Skill Browser with access to the top 5,000 community skills from skills.sh. Search, browse, and install pre-built skills directly into your personal collection.

Browse Skills

Browse Skills

Click to view full size

Installing Skill

Installing Skill

Click to view full size

Jan 28, 2026

  • Use a barebones fallback markdown render when markdown renders like KaTex fail

  • Use sanitizeHtml to avoid breaking layout when displaying rendered html

Jan 26, 2026

  • Add copy button to TextViewer popover menu

  • Add proceed and retry buttons at the bottom of Threads to continue agent loop

  • Add filesystem tools in computer extension

  • Add a simple sendUserMessage API in UI to simulate a new user message on the thread

  • Implement TextViewer component for displaying Tool Args, Tool Output + SystemPrompt

Jan 24, 2026

  • Auto collapse long tool args content and add ability to min/maximize text content

Jan 23, 2026


v3 Released

See v3 release notes for details on the major new features and improvements in v3.