CLI
CLI interface for all your LLMs
Using the CLI
Ask questions directly from the command line:
# Simple query
llms "What is the capital of France?"
# With specific model
llms -m grok-4-fast "Explain quantum computing"
# With system prompt
llms -s "You are a helpful coding assistant" "How do I reverse a string in Python?"
# With image
llms --image photo.jpg "What's in this image?"
# With audio
llms --audio recording.mp3 "Transcribe this audio"
# With file
llms --file document.pdf "Summarize this PDF"Configure Default Model
Set your preferred default model:
llms --default grok-4-fastCommon CLI Examples
Text Generation
# Basic chat
llms "Explain quantum computing in simple terms"
# With specific model
llms -m gemini-2.5-pro "Write a Python function to sort a list"
# With system prompt
llms -s "You are a quantum expert" "Explain quantum computing"
# Display full JSON response
llms "Hello" --rawTool Calling & Function Calling
All registered tools are automatically available in CLI mode:
# Use all available tools (default)
llms "Read the file data.txt and calculate the sum"
# Use specific tools
llms --tools calc,get_current_time "What time is it in Tokyo and what's 15% of 230?"
# Disable all tools
llms --tools none "Tell me a joke"Image Analysis
# Local image
llms --image screenshot.png "What's in this image?"
# Remote image
llms --image https://example.com/photo.jpg "Describe this photo"
# With specific vision model
llms -m gemini-2.5-flash --image chart.png "Analyze this chart"Audio Transcription
# Transcribe audio
llms --audio meeting.wav "Summarize this meeting recording"
# With specific audio model
llms -m gpt-4o-audio-preview --audio interview.mp3 "Extract main topics"Document Processing
# Summarize PDF
llms --file report.pdf "Summarize the key points"
# Extract data
llms -m gemini-flash-latest --file policy.pdf "Extract action items"Image Generation
Generate images directly from the CLI:
# Generate image with default model
llms --out image "A serene mountain landscape at sunset"
# Generate with specific model
llms -m "gemini-2.5-flash-image" --out image "Logo for a tech startup"
# All generated images are saved to ~/.llms/cache
llms -m "Gemini 2.5 Flash Image" --out image "cat in a hat"Audio Generation
Generate audio with Text-to-Speech models:
# Generate audio with default TTS model
llms --out audio "Welcome to our podcast"
# With specific TTS model
llms -m gemini-2.5-pro-preview-tts --out audio "Merry Christmas"
# All generated audio is saved to ~/.llms/cache
llms -m gemini-2.5-flash-preview-tts --out audio "Hello world"CLI Reference
Basic Usage
llms [OPTIONS] [PROMPT]Chat
# Simple query
llms "What is the capital of France?"
# With specific model
llms -m grok-4-fast "Explain quantum computing"
# With system prompt
llms -s "You are a helpful assistant" "Write a Python function"Server
# Start server on port 8000
llms --serve 8000
# With verbose logging
llms --serve 8000 --verboseConfiguration
# Initialize configuration
llms --init
# List providers and models
llms --list
llms ls
# Enable/disable providers
llms --enable groq openai
llms --disable ollama
# Set default model
llms --default grok-4-fast
# Check provider status
llms --check groq
# Update provider definitions from models.dev
llms --update-providersExtensions Management
# List available extensions
llms --add
# Install an extension
llms --add fast_mcp
# Install from GitHub
llms --add github-user/repo-name
# List installed extensions
llms --remove
# Uninstall an extension
llms --remove fast_mcpOptions Reference
Model & Provider Options
-m, --model MODEL
Specify which model to use:
llms -m grok-4-fast "Hello"
llms -m gemini-2.5-pro "Explain quantum physics"-s, --system PROMPT
Set system prompt:
llms -s "You are a helpful coding assistant" "How do I sort an array?"Input Options
--image IMAGE
Process image input:
llms --image photo.jpg "What's in this image?"
llms --image https://example.com/chart.png "Analyze this chart"--audio AUDIO
Process audio input:
llms --audio recording.mp3 "Transcribe this"
llms --audio meeting.wav "Summarize this meeting"--file FILE
Process file/document input:
llms --file document.pdf "Summarize this PDF"
llms --file report.pdf "Extract key points"--chat REQUEST
Use custom chat completion request:
llms --chat request.json
llms --chat request.json "Override prompt"Standard Input
llms now accepts OpenAI-compatible Chat Completion requests via standard input, making it easy to integrate into shell pipelines and scripts.
When JSON is piped in, llms detects it automatically - no extra flags needed:
cat request.json | llmsBuild requests inline with a heredoc:
llms <<EOF
{
"model": "Minimax M2.5",
"messages": [
{ "role": "user", "content": "Capital of France?" }
]
}
EOFCombine with other CLI tools to generate requests dynamically:
echo '{"messages":[{"role":"user","content":"Summarize:'"$(cat notes.txt)"'"}]}' | llmsCombine with other CLI tools to generate requests dynamically:
echo '{"messages":[{"role":"user","content":"Summarize:'"$(cat notes.txt)"'"}]}' | llmsThis pairs well with structured outputs support and jq to build end-to-end JSON pipelines:
(llms <<EOF
{
"model": "moonshotai/kimi-k2-instruct",
"messages": [{"role":"user", "content":"Return capital cities for: France, Italy, Spain, Japan." }],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "country_capitals",
"schema": {
"type": "object",
"properties": {
"capitals": {
"type": "array",
"items": {
"type": "object",
"properties": {
"country": { "type": "string" },
"capital": { "type": "string" }
},
"required": ["country","capital"]
}
}
},
"required": ["capitals"]
}
}
}
}
EOF
) | jq -r '.capitals[] | "\(.country): \(.capital)"'Output:
France: Paris
Italy: Rome
Spain: Madrid
Japan: TokyoRequest Options
--args PARAMS
Add custom parameters to request (URL-encoded):
llms --args "temperature=0.7&seed=111" "Hello"
llms --args "max_completion_tokens=50" "Tell me a joke"Output Options
--raw
Display full JSON response:
llms --raw "What is 2+2?"--verbose
Enable detailed logging:
llms --verbose "Hello"
llms --serve 8000 --verbose--logprefix PREFIX
Custom log message prefix:
llms --verbose --logprefix "[DEBUG] " "Hello"Server Options
--serve PORT
Start HTTP server:
llms --serve 8000
llms --serve 3000 --verbose--root PATH
Custom root directory for UI files:
llms --serve 8000 --root /path/to/uiConfiguration Options
--config FILE
Use custom configuration file:
llms --config /path/to/config.json "Hello"--init
Create default configuration:
llms --init--list, ls
List providers and models:
llms --list
llms ls
llms ls groq anthropic--enable PROVIDER
Enable one or more providers:
llms --enable groq
llms --enable openai anthropic grok--disable PROVIDER
Disable one or more providers:
llms --disable ollama
llms --disable openai anthropic--default MODEL
Set default model:
llms --default grok-4-fast
llms --default gemini-2.5-pro--check PROVIDER [MODELS...]
Check provider status:
llms --check groq
llms --check groq kimi-k2 llama4:400b--update-providers
Update provider definitions from models.dev:
llms --update-providers--tools TOOLS
Enable specific tools for function calling:
# Use all tools (default)
llms --tools all "What time is it and calculate 15% of 230?"
# Use specific tools
llms --tools calc,get_current_time "What time is it in Tokyo?"
# Disable all tools
llms --tools none "Tell me a joke"--out OUTPUT_TYPE
Generate media output (image or audio):
# Generate images
llms --out image "A serene mountain landscape"
# Generate audio
llms --out audio "Welcome message"--add [EXTENSION]
Install or list available extensions:
# List available extensions
llms --add
# Install an extension
llms --add fast_mcp
# Install from GitHub
llms --add github-user/repo-name--remove [EXTENSION]
Uninstall or list installed extensions:
# List installed extensions
llms --remove
# Uninstall an extension
llms --remove fast_mcpPersistence Options
By default, all chat completions are saved to the database, including both the chat thread (conversation history) and the individual API request logs. Use these options to control what gets saved to the database:
--nohistory
Skip saving the chat thread (conversation history) to the database. The individual API request log is still recorded.
llms "What is the capital of France?" --nohistory--nostore
Do not save anything to the database - no request log and no chat thread history. Implies --nohistory.
llms "What is the capital of France?" --nostoreHelp
-h, --help
Show help message:
llms --helpExamples
Text Generation
# Basic chat
llms "Explain quantum computing"
# With specific model
llms -m gemini-2.5-pro "Write a Python function to sort a list"
# With system prompt
llms -s "You are a quantum expert" "Explain entanglement"
# With custom parameters
llms --args "temperature=0.3&max_completion_tokens=100" "Tell me a joke"Image Analysis
# Default image template
llms --image ./screenshot.png
# With prompt
llms --image ./chart.png "Analyze this chart"
# With specific model
llms -m qwen2.5vl --image document.jpg "Extract text"
# Remote image
llms --image https://example.com/photo.jpg "Describe this"Audio Processing
# Default audio template (transcribe)
llms --audio recording.mp3
# With prompt
llms --audio meeting.wav "Summarize this meeting"
# With specific model
llms -m gpt-4o-audio-preview --audio interview.mp3 "Extract topics"Document Processing
# Default file template (summarize)
llms --file document.pdf
# With prompt
llms --file policy.pdf "Summarize key changes"
# With specific model
llms -m gpt-5 --file report.pdf "Extract action items"Custom Templates
# Use custom chat template
llms --chat custom-request.json "My prompt"
# Image with custom template
llms --chat image-request.json --image photo.jpg
# Audio with custom template
llms --chat audio-request.json --audio recording.mp3Server Mode
# Start server
llms --serve 8000
# With verbose logging
llms --serve 8000 --verbose
# Custom port
llms --serve 3000
# Custom UI root
llms --serve 8000 --root ./my-uiConfiguration Management
# Initialize config
llms --init
# List all providers
llms ls
# List specific providers
llms ls groq anthropic openai
# Enable free providers
llms --enable openrouter_free google_free groq
# Enable paid providers
llms --enable openai anthropic grok
# Disable provider
llms --disable ollama
# Set default model
llms --default grok-4-fast
# Check provider status
llms --check groq
llms --check groq kimi-k2 llama4:400b gpt-oss:120b
# Update provider definitions from models.dev (auto-updated daily)
llms --update-providersExtensions Management
# List available extensions from github.com/llmspy
llms --add
# Install an extension
llms --add fast_mcp
# Install a 3rd-party extension from GitHub
llms --add github-user/repo-name
# List installed extensions
llms --remove
# Uninstall an extension
llms --remove fast_mcpTool Calling & Function Calling
# Use all available tools (default)
llms "Read the file data.txt and calculate the sum"
# Use specific tools
llms --tools calc,get_current_time "What time is it in Tokyo and what's 15% of 230?"
# Disable all tools
llms --tools none "Tell me a joke"
# Tools work with any model that supports function calling
llms -m gpt-4o --tools calc "Calculate the area of a circle with radius 5"Image Generation
# Generate image with default model
llms --out image "A serene mountain landscape at sunset"
# Generate with specific model by ID
llms -m "gemini-2.5-flash-image" --out image "Logo for a tech startup"
# Generate with specific model by name
llms -m "Gemini 2.5 Flash Image" --out image "cat in a hat"
# Images are saved to ~/.llms/cache with local path and HTTP URLAudio Generation
# Generate audio with default TTS model
llms --out audio "Welcome to our podcast"
# With specific TTS model
llms -m gemini-2.5-pro-preview-tts --out audio "Merry Christmas"
# Generate with Flash TTS
llms -m gemini-2.5-flash-preview-tts --out audio "Hello world"
# Audio files are saved to ~/.llms/cache with local path and HTTP URLEnvironment Variables
API Keys
OPENROUTER_API_KEY # OpenRouter
GEMINI_API_KEY # Gemini (Google)
ANTHROPIC_API_KEY # Claude (Anthropic)
OPENAI_API_KEY # Open AI
GROQ_API_KEY # Groq API
ZHIPU_API_KEY # Z.ai Coding Plan
MINIMAX_API_KEY # MiniMax
DASHSCOPE_API_KEY # Qwen (Alibaba)
XAI_API_KEY # Grok (X.AI)
NVIDIA_API_KEY # NVidia NIM
GITHUB_TOKEN # GitHub Copilot Models
MISTRAL_API_KEY # Mistral
DEEPSEEK_API_KEY # DeepSeek
CHUTES_API_KEY # chutes.ai OSS LLM and Image Models
HF_TOKEN # Hugging Face
FIREWORKS_API_KEY # fireworks.ai OSS Models
CODESTRAL_API_KEY # Codestral (Mistral)
LMSTUDIO_API_KEY # Placeholder for local LM StudioOther Settings
VERBOSE=1 # Enable verbose logging
DEBUG=1 # Enable DEBUG logging