llms.py
Features

CLI

CLI interface for all your LLMs

Using the CLI

Ask questions directly from the command line:

# Simple query
llms "What is the capital of France?"

# With specific model
llms -m grok-4-fast "Explain quantum computing"

# With system prompt
llms -s "You are a helpful coding assistant" "How do I reverse a string in Python?"

# With image
llms --image photo.jpg "What's in this image?"

# With audio
llms --audio recording.mp3 "Transcribe this audio"

# With file
llms --file document.pdf "Summarize this PDF"

Configure Default Model

Set your preferred default model:

llms --default grok-4-fast

Common CLI Examples

Text Generation

# Basic chat
llms "Explain quantum computing in simple terms"

# With specific model
llms -m gemini-2.5-pro "Write a Python function to sort a list"

# With system prompt
llms -s "You are a quantum expert" "Explain quantum computing"

# Display full JSON response
llms "Hello" --raw

Tool Calling & Function Calling

All registered tools are automatically available in CLI mode:

# Use all available tools (default)
llms "Read the file data.txt and calculate the sum"

# Use specific tools
llms --tools calc,get_current_time "What time is it in Tokyo and what's 15% of 230?"

# Disable all tools
llms --tools none "Tell me a joke"

Image Analysis

# Local image
llms --image screenshot.png "What's in this image?"

# Remote image
llms --image https://example.com/photo.jpg "Describe this photo"

# With specific vision model
llms -m gemini-2.5-flash --image chart.png "Analyze this chart"

Audio Transcription

# Transcribe audio
llms --audio meeting.wav "Summarize this meeting recording"

# With specific audio model
llms -m gpt-4o-audio-preview --audio interview.mp3 "Extract main topics"

Document Processing

# Summarize PDF
llms --file report.pdf "Summarize the key points"

# Extract data
llms -m gemini-flash-latest --file policy.pdf "Extract action items"

Image Generation

Generate images directly from the CLI:

# Generate image with default model
llms --out image "A serene mountain landscape at sunset"

# Generate with specific model
llms -m "gemini-2.5-flash-image" --out image "Logo for a tech startup"

# All generated images are saved to ~/.llms/cache
llms -m "Gemini 2.5 Flash Image" --out image "cat in a hat"

Audio Generation

Generate audio with Text-to-Speech models:

# Generate audio with default TTS model
llms --out audio "Welcome to our podcast"

# With specific TTS model
llms -m gemini-2.5-pro-preview-tts --out audio "Merry Christmas"

# All generated audio is saved to ~/.llms/cache
llms -m gemini-2.5-flash-preview-tts --out audio "Hello world"

CLI Reference

Basic Usage

llms [OPTIONS] [PROMPT]

Chat

# Simple query
llms "What is the capital of France?"

# With specific model
llms -m grok-4-fast "Explain quantum computing"

# With system prompt
llms -s "You are a helpful assistant" "Write a Python function"

Server

# Start server on port 8000
llms --serve 8000

# With verbose logging
llms --serve 8000 --verbose

Configuration

# Initialize configuration
llms --init

# List providers and models
llms --list
llms ls

# Enable/disable providers
llms --enable groq openai
llms --disable ollama

# Set default model
llms --default grok-4-fast

# Check provider status
llms --check groq

# Update provider definitions from models.dev
llms --update-providers

Extensions Management

# List available extensions
llms --add

# Install an extension
llms --add fast_mcp

# Install from GitHub
llms --add github-user/repo-name

# List installed extensions
llms --remove

# Uninstall an extension
llms --remove fast_mcp

Options Reference

Model & Provider Options

-m, --model MODEL

Specify which model to use:

llms -m grok-4-fast "Hello"
llms -m gemini-2.5-pro "Explain quantum physics"

-s, --system PROMPT

Set system prompt:

llms -s "You are a helpful coding assistant" "How do I sort an array?"

Input Options

--image IMAGE

Process image input:

llms --image photo.jpg "What's in this image?"
llms --image https://example.com/chart.png "Analyze this chart"

--audio AUDIO

Process audio input:

llms --audio recording.mp3 "Transcribe this"
llms --audio meeting.wav "Summarize this meeting"

--file FILE

Process file/document input:

llms --file document.pdf "Summarize this PDF"
llms --file report.pdf "Extract key points"

--chat REQUEST

Use custom chat completion request:

llms --chat request.json
llms --chat request.json "Override prompt"

Request Options

--args PARAMS

Add custom parameters to request (URL-encoded):

llms --args "temperature=0.7&seed=111" "Hello"
llms --args "max_completion_tokens=50" "Tell me a joke"

Output Options

--raw

Display full JSON response:

llms --raw "What is 2+2?"

--verbose

Enable detailed logging:

llms --verbose "Hello"
llms --serve 8000 --verbose

--logprefix PREFIX

Custom log message prefix:

llms --verbose --logprefix "[DEBUG] " "Hello"

Server Options

--serve PORT

Start HTTP server:

llms --serve 8000
llms --serve 3000 --verbose

--root PATH

Custom root directory for UI files:

llms --serve 8000 --root /path/to/ui

Configuration Options

--config FILE

Use custom configuration file:

llms --config /path/to/config.json "Hello"

--init

Create default configuration:

llms --init

--list, ls

List providers and models:

llms --list
llms ls
llms ls groq anthropic

--enable PROVIDER

Enable one or more providers:

llms --enable groq
llms --enable openai anthropic grok

--disable PROVIDER

Disable one or more providers:

llms --disable ollama
llms --disable openai anthropic

--default MODEL

Set default model:

llms --default grok-4-fast
llms --default gemini-2.5-pro

--check PROVIDER [MODELS...]

Check provider status:

llms --check groq
llms --check groq kimi-k2 llama4:400b

--update-providers

Update provider definitions from models.dev:

llms --update-providers

--tools TOOLS

Enable specific tools for function calling:

# Use all tools (default)
llms --tools all "What time is it and calculate 15% of 230?"

# Use specific tools
llms --tools calc,get_current_time "What time is it in Tokyo?"

# Disable all tools
llms --tools none "Tell me a joke"

--out OUTPUT_TYPE

Generate media output (image or audio):

# Generate images
llms --out image "A serene mountain landscape"

# Generate audio
llms --out audio "Welcome message"

--add [EXTENSION]

Install or list available extensions:

# List available extensions
llms --add

# Install an extension
llms --add fast_mcp

# Install from GitHub
llms --add github-user/repo-name

--remove [EXTENSION]

Uninstall or list installed extensions:

# List installed extensions
llms --remove

# Uninstall an extension
llms --remove fast_mcp

Help

-h, --help

Show help message:

llms --help

Examples

Text Generation

# Basic chat
llms "Explain quantum computing"

# With specific model
llms -m gemini-2.5-pro "Write a Python function to sort a list"

# With system prompt
llms -s "You are a quantum expert" "Explain entanglement"

# With custom parameters
llms --args "temperature=0.3&max_completion_tokens=100" "Tell me a joke"

Image Analysis

# Default image template
llms --image ./screenshot.png

# With prompt
llms --image ./chart.png "Analyze this chart"

# With specific model
llms -m qwen2.5vl --image document.jpg "Extract text"

# Remote image
llms --image https://example.com/photo.jpg "Describe this"

Audio Processing

# Default audio template (transcribe)
llms --audio recording.mp3

# With prompt
llms --audio meeting.wav "Summarize this meeting"

# With specific model
llms -m gpt-4o-audio-preview --audio interview.mp3 "Extract topics"

Document Processing

# Default file template (summarize)
llms --file document.pdf

# With prompt
llms --file policy.pdf "Summarize key changes"

# With specific model
llms -m gpt-5 --file report.pdf "Extract action items"

Custom Templates

# Use custom chat template
llms --chat custom-request.json "My prompt"

# Image with custom template
llms --chat image-request.json --image photo.jpg

# Audio with custom template
llms --chat audio-request.json --audio recording.mp3

Server Mode

# Start server
llms --serve 8000

# With verbose logging
llms --serve 8000 --verbose

# Custom port
llms --serve 3000

# Custom UI root
llms --serve 8000 --root ./my-ui

Configuration Management

# Initialize config
llms --init

# List all providers
llms ls

# List specific providers
llms ls groq anthropic openai

# Enable free providers
llms --enable openrouter_free google_free groq

# Enable paid providers
llms --enable openai anthropic grok

# Disable provider
llms --disable ollama

# Set default model
llms --default grok-4-fast

# Check provider status
llms --check groq
llms --check groq kimi-k2 llama4:400b gpt-oss:120b

# Update provider definitions from models.dev (auto-updated daily)
llms --update-providers

Extensions Management

# List available extensions from github.com/llmspy
llms --add

# Install an extension
llms --add fast_mcp

# Install a 3rd-party extension from GitHub
llms --add github-user/repo-name

# List installed extensions
llms --remove

# Uninstall an extension
llms --remove fast_mcp

Tool Calling & Function Calling

# Use all available tools (default)
llms "Read the file data.txt and calculate the sum"

# Use specific tools
llms --tools calc,get_current_time "What time is it in Tokyo and what's 15% of 230?"

# Disable all tools
llms --tools none "Tell me a joke"

# Tools work with any model that supports function calling
llms -m gpt-4o --tools calc "Calculate the area of a circle with radius 5"

Image Generation

# Generate image with default model
llms --out image "A serene mountain landscape at sunset"

# Generate with specific model by ID
llms -m "gemini-2.5-flash-image" --out image "Logo for a tech startup"

# Generate with specific model by name
llms -m "Gemini 2.5 Flash Image" --out image "cat in a hat"

# Images are saved to ~/.llms/cache with local path and HTTP URL

Audio Generation

# Generate audio with default TTS model
llms --out audio "Welcome to our podcast"

# With specific TTS model
llms -m gemini-2.5-pro-preview-tts --out audio "Merry Christmas"

# Generate with Flash TTS
llms -m gemini-2.5-flash-preview-tts --out audio "Hello world"

# Audio files are saved to ~/.llms/cache with local path and HTTP URL

Environment Variables

API Keys

OPENROUTER_API_KEY     # OpenRouter
GEMINI_API_KEY         # Gemini (Google)
ANTHROPIC_API_KEY      # Claude (Anthropic)
OPENAI_API_KEY         # Open AI
GROQ_API_KEY           # Groq API
ZHIPU_API_KEY          # Z.ai Coding Plan
MINIMAX_API_KEY        # MiniMax
DASHSCOPE_API_KEY      # Qwen (Alibaba)
XAI_API_KEY            # Grok (X.AI)
NVIDIA_API_KEY         # NVidia NIM
GITHUB_TOKEN           # GitHub Copilot Models
MISTRAL_API_KEY        # Mistral
DEEPSEEK_API_KEY       # DeepSeek
CHUTES_API_KEY         # chutes.ai OSS LLM and Image Models
HF_TOKEN               # Hugging Face
FIREWORKS_API_KEY      # fireworks.ai OSS Models
CODESTRAL_API_KEY      # Codestral (Mistral)
LMSTUDIO_API_KEY       # Placeholder for local LM Studio

Other Settings

VERBOSE=1              # Enable verbose logging
DEBUG=1                # Enable DEBUG logging

Next Steps