Documentation
Lightweight OpenAI compatible CLI and server gateway for multiple LLMs
Introducing llms.py ð
llms.py is a super lightweight CLI tool and OpenAI-compatible server that acts as a configurable gateway over multiple configurable Large Language Model (LLM) providers.
Quick Start
Install
pip install llms-pySet API Keys
export OPENROUTER_API_KEY="sk-or-..."
export GROQ_API_KEY="gsk_..."ð Key Features
- ðļ Free: True Open source with no hidden costs or restrictions!
- ðŠķ Ultra-Lightweight: Single file with just one
aiohttpdependency - ð Multi-Provider Support: Access over 530 models from 24 providers
- ðŊ Intelligent Routing: Automatic failover between providers
- ðŧ Web UI: ChatGPT-like interface with dark mode
- ð Built-in Analytics: Track costs, tokens, and usage
- ð Privacy First: All data stored locally in browser
- ðģ Docker Ready: Pre-built images available
ðŊ OpenRouter but Local
llms.py is designed as a unified gateway that seamlessly connects you to multiple LLM providers through a single, consistent interface. Whether using cloud APIs or local models, llms provides intelligent routing and automatic failover to ensure your AI workflows connect to your chosen providers in your preferred priority - whether optimizing for cost, performance, or availability.
⥠Ultra-Lightweight Architecture
- Simplicity: Just one llms.py file (easily customizable)
- Single Dependency: Only requires
aiohttp(Pillow optional)- Zero Dependencies for ComfyUI - Ideal for use in Custom Nodes
- Flexibility: Works with any OpenAI-compatible client or framework
- Reliability: Automatic failover ensures your workflows never break
- Economy: Intelligent routing minimizes API costs
- Privacy: Mix local and cloud models based on your data sensitivity
- Future-Proof: Easily add new providers as they emerge
- No Setup: Just download and use, configure preferred LLMs in llms.json
llms.py transforms the complexity of managing multiple LLM providers into a simple, unified experience. Whether you're researching capabilities of new models, building the next breakthrough AI application, or just want reliable access to the best models available, llms.py has you covered.
Get started today and avoid expensive cloud lock-ins with the freedom of provider-agnostic AI development! ð
ð Expanded Provider Support
Acts as an intelligent gateway routing requests to over 530 models from 24 different providers via models.dev integration:
| Provider | Models | Provider | Models |
|---|---|---|---|
| OpenAI | 44+ | Alibaba | 38 |
| Anthropic | 10+ | Hugging Face | 13 |
| 26+ | Chutes | 68 | |
| OpenRouter | 200+ | DeepSeek | 2 |
| Groq | 17+ | Fireworks AI | 12 |
| xAI (Grok) | 22+ | GitHub Copilot | 27 |
| Mistral | 25+ | GitHub Models | 55 |
| Qwen | 20+ | Nvidia | 59 |
| Cerebras | 2 | MiniMax | 2 |
| Zai Coding Plan | 9+ | Ollama | local |
| Ollama Cloud | 29+ | LMStudio | local |
ð Automatic Provider Updates
Provider and model definitions are automatically updated daily from models.dev, or manually with:
llms --update-providersConfiguration
Enable providers with minimal configuration â all settings are inherited from models.dev:
{
"openai": { "enabled": true },
"anthropic": { "enabled": true },
"google": { "enabled": true }
}Learn more about Configuration â
ð Intelligent Request Routing
- Automatic Failover: If one provider fails, automatically retry with the next available provider
- Cost Optimization: Define free/cheap/local providers first to minimize costs
- Model Mapping: Use unified model names that map to different provider-specific names
ðĻ ChatGPT-like Web UI
A modern, fast, and privacy-focused web interface for interacting with all your LLMs.
- Offline & Private: All data stored locally in SQLite
- Dark Mode: Automatic or manual dark mode toggle
- Rich Markdown: Full markdown support with syntax highlighting
- Search: Quickly find past conversations
- Export/Import: Backup and transfer chat histories
Learn more about the Web UI â
ðĨïļ CLI Interface
- Interactive command-line tool
- Support for all modalities (text, image, audio, files)
- Custom system prompts
- Raw JSON output mode
ðŊ Multimodal Support
Process text, images, audio, and documents with capable models.
- Text Generation: Chat completions with any supported model
- Vision Models: Process images through vision-capable models (GPT-4V, Gemini Vision, etc.)
- Audio Processing: Handle audio inputs through audio-capable models
- Document Processing: Analyze PDFs and documents with capable models
- Drag & Drop: Easy file attachments in the UI
Learn more about Multimodal Support â
Flexible Deployment Options
- CLI Tool: Interactive command-line interface for quick queries
- HTTP Server: OpenAI-compatible server at
http://localhost:{PORT}/v1/chat/completions - Python Module: Import and use programmatically in your applications
- ComfyUI Node: Embed directly in ComfyUI workflows
Simple and Customizable
- Environment Variables: Secure API key management
- Provider Management: Easy enable/disable of providers
- Custom Models: Define your own model aliases and mappings
- Unified Configuration: Single llms.json to configure all providers and models
- Custom model aliases and mappings
- Flexible chat templates
- Environment variable support
Learn more about Configuration â
ð OpenAI-Compatible API
- Drop-in replacement for OpenAI API
- Works with any OpenAI-compatible client
- Streaming support
- Custom parameters
ð Analytics & Monitoring
Track costs, usage, and performance across all providers.
- Cost Tracking: Per-message, per-thread, and monthly cost analytics
- Token Metrics: Input/output token tracking
- Activity Logs: Detailed request history
- Provider Stats: Response times and reliability metrics
Learn more about Analytics â
ð Intelligent Provider Routing
Automatic failover and cost optimization across providers.
- Multi-Provider: Support for 10+ provider types with 160+ models
- Auto-Failover: If one provider fails, automatically try the next
- Cost Optimization: Route to free/cheap providers first
- Model Mapping: Unified model names across providers
Learn more about Providers â
ð Security
- Optional GitHub OAuth authentication
- User access restrictions
Learn more about GitHub OAuth â
Use Cases
For Developers
- API Gateway: Centralize all LLM provider access through one endpoint
- Cost Management: Automatically route to cheapest available providers
- Reliability: Built-in failover ensures high availability
For Enterprises
- Vendor Independence: Avoid lock-in to any single LLM provider
- Scalability: Distribute load across multiple providers
- Budget Control: Intelligent routing to optimize costs
For ComfyUI Users
- Hybrid Workflows: Access to both Ollama/LM Studio models with cloud APIs
- Zero Setup: Requires no additional dependencies, use custom builds with only what you need
Links
- ð llms GitHub Repository
- ðĶ llms-py PyPI Package