llms.py
Features

CLI

CLI interface for all your LLMs

Using the CLI

Ask questions directly from the command line:

# Simple query
llms "What is the capital of France?"

# With specific model
llms -m grok-4-fast "Explain quantum computing"

# With system prompt
llms -s "You are a helpful coding assistant" "How do I reverse a string in Python?"

# With image
llms --image photo.jpg "What's in this image?"

# With audio
llms --audio recording.mp3 "Transcribe this audio"

# With file
llms --file document.pdf "Summarize this PDF"

Configure Default Model

Set your preferred default model:

llms --default grok-4-fast

Common CLI Examples

Text Generation

# Basic chat
llms "Explain quantum computing in simple terms"

# With specific model
llms -m gemini-2.5-pro "Write a Python function to sort a list"

# With system prompt
llms -s "You are a quantum expert" "Explain quantum computing"

# Display full JSON response
llms "Hello" --raw

Tool Calling & Function Calling

All registered tools are automatically available in CLI mode:

# Use all available tools (default)
llms "Read the file data.txt and calculate the sum"

# Use specific tools
llms --tools calc,get_current_time "What time is it in Tokyo and what's 15% of 230?"

# Disable all tools
llms --tools none "Tell me a joke"

Image Analysis

# Local image
llms --image screenshot.png "What's in this image?"

# Remote image
llms --image https://example.com/photo.jpg "Describe this photo"

# With specific vision model
llms -m gemini-2.5-flash --image chart.png "Analyze this chart"

Audio Transcription

# Transcribe audio
llms --audio meeting.wav "Summarize this meeting recording"

# With specific audio model
llms -m gpt-4o-audio-preview --audio interview.mp3 "Extract main topics"

Document Processing

# Summarize PDF
llms --file report.pdf "Summarize the key points"

# Extract data
llms -m gemini-flash-latest --file policy.pdf "Extract action items"

Image Generation

Generate images directly from the CLI:

# Generate image with default model
llms --out image "A serene mountain landscape at sunset"

# Generate with specific model
llms -m "gemini-2.5-flash-image" --out image "Logo for a tech startup"

# All generated images are saved to ~/.llms/cache
llms -m "Gemini 2.5 Flash Image" --out image "cat in a hat"

Audio Generation

Generate audio with Text-to-Speech models:

# Generate audio with default TTS model
llms --out audio "Welcome to our podcast"

# With specific TTS model
llms -m gemini-2.5-pro-preview-tts --out audio "Merry Christmas"

# All generated audio is saved to ~/.llms/cache
llms -m gemini-2.5-flash-preview-tts --out audio "Hello world"

CLI Reference

Basic Usage

llms [OPTIONS] [PROMPT]

Chat

# Simple query
llms "What is the capital of France?"

# With specific model
llms -m grok-4-fast "Explain quantum computing"

# With system prompt
llms -s "You are a helpful assistant" "Write a Python function"

Server

# Start server on port 8000
llms --serve 8000

# With verbose logging
llms --serve 8000 --verbose

Configuration

# Initialize configuration
llms --init

# List providers and models
llms --list
llms ls

# Enable/disable providers
llms --enable groq openai
llms --disable ollama

# Set default model
llms --default grok-4-fast

# Check provider status
llms --check groq

# Update provider definitions from models.dev
llms --update-providers

Extensions Management

# List available extensions
llms --add

# Install an extension
llms --add fast_mcp

# Install from GitHub
llms --add github-user/repo-name

# List installed extensions
llms --remove

# Uninstall an extension
llms --remove fast_mcp

Options Reference

Model & Provider Options

-m, --model MODEL

Specify which model to use:

llms -m grok-4-fast "Hello"
llms -m gemini-2.5-pro "Explain quantum physics"

-s, --system PROMPT

Set system prompt:

llms -s "You are a helpful coding assistant" "How do I sort an array?"

Input Options

--image IMAGE

Process image input:

llms --image photo.jpg "What's in this image?"
llms --image https://example.com/chart.png "Analyze this chart"

--audio AUDIO

Process audio input:

llms --audio recording.mp3 "Transcribe this"
llms --audio meeting.wav "Summarize this meeting"

--file FILE

Process file/document input:

llms --file document.pdf "Summarize this PDF"
llms --file report.pdf "Extract key points"

--chat REQUEST

Use custom chat completion request:

llms --chat request.json
llms --chat request.json "Override prompt"

Standard Input

llms now accepts OpenAI-compatible Chat Completion requests via standard input, making it easy to integrate into shell pipelines and scripts.

When JSON is piped in, llms detects it automatically - no extra flags needed:

cat request.json | llms

Build requests inline with a heredoc:

llms <<EOF
{
  "model": "Minimax M2.5",
  "messages": [
    { "role": "user", "content": "Capital of France?" }
  ]
}
EOF

Combine with other CLI tools to generate requests dynamically:

echo '{"messages":[{"role":"user","content":"Summarize:'"$(cat notes.txt)"'"}]}' | llms

Combine with other CLI tools to generate requests dynamically:

echo '{"messages":[{"role":"user","content":"Summarize:'"$(cat notes.txt)"'"}]}' | llms

This pairs well with structured outputs support and jq to build end-to-end JSON pipelines:

(llms <<EOF
{
    "model": "moonshotai/kimi-k2-instruct",
    "messages": [{"role":"user", "content":"Return capital cities for: France, Italy, Spain, Japan." }],
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "country_capitals",
            "schema": {
                "type": "object",
                "properties": {
                    "capitals": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "country": { "type": "string" },
                                "capital": { "type": "string" }
                            },
                            "required": ["country","capital"]
                        }
                    }
                },
                "required": ["capitals"]
            }
        }
    }
}
EOF
) | jq -r '.capitals[] | "\(.country): \(.capital)"'

Output:

France: Paris
Italy: Rome
Spain: Madrid
Japan: Tokyo

Request Options

--args PARAMS

Add custom parameters to request (URL-encoded):

llms --args "temperature=0.7&seed=111" "Hello"
llms --args "max_completion_tokens=50" "Tell me a joke"

Output Options

--raw

Display full JSON response:

llms --raw "What is 2+2?"

--verbose

Enable detailed logging:

llms --verbose "Hello"
llms --serve 8000 --verbose

--logprefix PREFIX

Custom log message prefix:

llms --verbose --logprefix "[DEBUG] " "Hello"

Server Options

--serve PORT

Start HTTP server:

llms --serve 8000
llms --serve 3000 --verbose

--root PATH

Custom root directory for UI files:

llms --serve 8000 --root /path/to/ui

Configuration Options

--config FILE

Use custom configuration file:

llms --config /path/to/config.json "Hello"

--init

Create default configuration:

llms --init

--list, ls

List providers and models:

llms --list
llms ls
llms ls groq anthropic

--enable PROVIDER

Enable one or more providers:

llms --enable groq
llms --enable openai anthropic grok

--disable PROVIDER

Disable one or more providers:

llms --disable ollama
llms --disable openai anthropic

--default MODEL

Set default model:

llms --default grok-4-fast
llms --default gemini-2.5-pro

--check PROVIDER [MODELS...]

Check provider status:

llms --check groq
llms --check groq kimi-k2 llama4:400b

--update-providers

Update provider definitions from models.dev:

llms --update-providers

--tools TOOLS

Enable specific tools for function calling:

# Use all tools (default)
llms --tools all "What time is it and calculate 15% of 230?"

# Use specific tools
llms --tools calc,get_current_time "What time is it in Tokyo?"

# Disable all tools
llms --tools none "Tell me a joke"

--out OUTPUT_TYPE

Generate media output (image or audio):

# Generate images
llms --out image "A serene mountain landscape"

# Generate audio
llms --out audio "Welcome message"

--add [EXTENSION]

Install or list available extensions:

# List available extensions
llms --add

# Install an extension
llms --add fast_mcp

# Install from GitHub
llms --add github-user/repo-name

--remove [EXTENSION]

Uninstall or list installed extensions:

# List installed extensions
llms --remove

# Uninstall an extension
llms --remove fast_mcp

Persistence Options

By default, all chat completions are saved to the database, including both the chat thread (conversation history) and the individual API request logs. Use these options to control what gets saved to the database:

--nohistory

Skip saving the chat thread (conversation history) to the database. The individual API request log is still recorded.

llms "What is the capital of France?" --nohistory

--nostore

Do not save anything to the database - no request log and no chat thread history. Implies --nohistory.

llms "What is the capital of France?" --nostore

Help

-h, --help

Show help message:

llms --help

Examples

Text Generation

# Basic chat
llms "Explain quantum computing"

# With specific model
llms -m gemini-2.5-pro "Write a Python function to sort a list"

# With system prompt
llms -s "You are a quantum expert" "Explain entanglement"

# With custom parameters
llms --args "temperature=0.3&max_completion_tokens=100" "Tell me a joke"

Image Analysis

# Default image template
llms --image ./screenshot.png

# With prompt
llms --image ./chart.png "Analyze this chart"

# With specific model
llms -m qwen2.5vl --image document.jpg "Extract text"

# Remote image
llms --image https://example.com/photo.jpg "Describe this"

Audio Processing

# Default audio template (transcribe)
llms --audio recording.mp3

# With prompt
llms --audio meeting.wav "Summarize this meeting"

# With specific model
llms -m gpt-4o-audio-preview --audio interview.mp3 "Extract topics"

Document Processing

# Default file template (summarize)
llms --file document.pdf

# With prompt
llms --file policy.pdf "Summarize key changes"

# With specific model
llms -m gpt-5 --file report.pdf "Extract action items"

Custom Templates

# Use custom chat template
llms --chat custom-request.json "My prompt"

# Image with custom template
llms --chat image-request.json --image photo.jpg

# Audio with custom template
llms --chat audio-request.json --audio recording.mp3

Server Mode

# Start server
llms --serve 8000

# With verbose logging
llms --serve 8000 --verbose

# Custom port
llms --serve 3000

# Custom UI root
llms --serve 8000 --root ./my-ui

Configuration Management

# Initialize config
llms --init

# List all providers
llms ls

# List specific providers
llms ls groq anthropic openai

# Enable free providers
llms --enable openrouter_free google_free groq

# Enable paid providers
llms --enable openai anthropic grok

# Disable provider
llms --disable ollama

# Set default model
llms --default grok-4-fast

# Check provider status
llms --check groq
llms --check groq kimi-k2 llama4:400b gpt-oss:120b

# Update provider definitions from models.dev (auto-updated daily)
llms --update-providers

Extensions Management

# List available extensions from github.com/llmspy
llms --add

# Install an extension
llms --add fast_mcp

# Install a 3rd-party extension from GitHub
llms --add github-user/repo-name

# List installed extensions
llms --remove

# Uninstall an extension
llms --remove fast_mcp

Tool Calling & Function Calling

# Use all available tools (default)
llms "Read the file data.txt and calculate the sum"

# Use specific tools
llms --tools calc,get_current_time "What time is it in Tokyo and what's 15% of 230?"

# Disable all tools
llms --tools none "Tell me a joke"

# Tools work with any model that supports function calling
llms -m gpt-4o --tools calc "Calculate the area of a circle with radius 5"

Image Generation

# Generate image with default model
llms --out image "A serene mountain landscape at sunset"

# Generate with specific model by ID
llms -m "gemini-2.5-flash-image" --out image "Logo for a tech startup"

# Generate with specific model by name
llms -m "Gemini 2.5 Flash Image" --out image "cat in a hat"

# Images are saved to ~/.llms/cache with local path and HTTP URL

Audio Generation

# Generate audio with default TTS model
llms --out audio "Welcome to our podcast"

# With specific TTS model
llms -m gemini-2.5-pro-preview-tts --out audio "Merry Christmas"

# Generate with Flash TTS
llms -m gemini-2.5-flash-preview-tts --out audio "Hello world"

# Audio files are saved to ~/.llms/cache with local path and HTTP URL

Environment Variables

API Keys

OPENROUTER_API_KEY     # OpenRouter
GEMINI_API_KEY         # Gemini (Google)
ANTHROPIC_API_KEY      # Claude (Anthropic)
OPENAI_API_KEY         # Open AI
GROQ_API_KEY           # Groq API
ZHIPU_API_KEY          # Z.ai Coding Plan
MINIMAX_API_KEY        # MiniMax
DASHSCOPE_API_KEY      # Qwen (Alibaba)
XAI_API_KEY            # Grok (X.AI)
NVIDIA_API_KEY         # NVidia NIM
GITHUB_TOKEN           # GitHub Copilot Models
MISTRAL_API_KEY        # Mistral
DEEPSEEK_API_KEY       # DeepSeek
CHUTES_API_KEY         # chutes.ai OSS LLM and Image Models
HF_TOKEN               # Hugging Face
FIREWORKS_API_KEY      # fireworks.ai OSS Models
CODESTRAL_API_KEY      # Codestral (Mistral)
LMSTUDIO_API_KEY       # Placeholder for local LM Studio

Other Settings

VERBOSE=1              # Enable verbose logging
DEBUG=1                # Enable DEBUG logging

Next Steps