Introduction
Introducing llms.py - Lightweight OpenAI compatible CLI and server gateway
Introducing llms.py 🚀
We're excited to announce llms.py - a super lightweight CLI tool and OpenAI-compatible server that acts as a configurable gateway over multiple configurable Large Language Model (LLM) providers.
🎯 OpenRouter but Local
llms.py is designed as a unified gateway that seamlessly connects you to multiple LLM providers through a single, consistent interface. Whether using cloud APIs or local models, llms provides intelligent routing and automatic failover to ensure your AI workflows connect to your chosen providers in your preferred priority - whether optimizing for cost, performance, or availability.
⚡ Ultra-Lightweight Architecture
- Simplicity: Just one llms.py file (easily customizable)
- Single Dependency: Only requires
aiohttp(Pillow optional)- Zero Dependencies for ComfyUI - Ideal for use in Custom Nodes
- Flexibility: Works with any OpenAI-compatible client or framework
- Reliability: Automatic failover ensures your workflows never break
- Economy: Intelligent routing minimizes API costs
- Privacy: Mix local and cloud models based on your data sensitivity
- Future-Proof: Easily add new providers as they emerge
- No Setup: Just download and use, configure preferred LLMs in llms.json
llms.py transforms the complexity of managing multiple LLM providers into a simple, unified experience. Whether you're researching capabilities of new models, building the next breakthrough AI application, or just want reliable access to the best models available, llms.py has you covered.
Get started today and avoid expensive cloud lock-ins with the freedom of provider-agnostic AI development! 🎉
🌐 Configurable Multi-Provider Gateway
Acts as an intelligent gateway that can route requests for 160+ models across:
Cloud Providers with Free Tiers
- OpenRouter
- Groq
- Codestral
Premium Cloud Providers
- OpenAI
- Anthropic
- Grok
- Qwen
- Mistral
Local Providers
- Ollama
- Restrict access to custom models
- Or auto-discovery of installed models
Custom Providers
Use JSON config to add any OpenAI-compatible API endpoints and models
🔄 Intelligent Request Routing
- Automatic Failover: If one provider fails, automatically retry with the next available provider
- Cost Optimization: Define free/cheap/local providers first to minimize costs
- Model Mapping: Use unified model names that map to different provider-specific names
🚀 Key Features
🎨 ChatGPT-like Web UI
A modern, fast, and privacy-focused web interface for interacting with all your LLMs.
- Offline & Private: All data stored locally in browser IndexedDB
- Dark Mode: Automatic or manual dark mode toggle
- Rich Markdown: Full markdown support with syntax highlighting
- Search: Quickly find past conversations
- Export/Import: Backup and transfer chat histories
🖥️ CLI Interface
- Interactive command-line tool
- Support for all modalities (text, image, audio, files)
- Custom system prompts
- Raw JSON output mode
🎯 Multimodal Support
Process text, images, audio, and documents with capable models.
- Text Generation: Chat completions with any supported model
- Vision Models: Process images through vision-capable models (GPT-4V, Gemini Vision, etc.)
- Audio Processing: Handle audio inputs through audio-capable models
- Document Processing: Analyze PDFs and documents with capable models
- Drag & Drop: Easy file attachments in the UI
Learn more about Multimodal Support →
Flexible Deployment Options
- CLI Tool: Interactive command-line interface for quick queries
- HTTP Server: OpenAI-compatible server at
http://localhost:{PORT}/v1/chat/completions - Python Module: Import and use programmatically in your applications
- ComfyUI Node: Embed directly in ComfyUI workflows
Simple and Customizable
- Environment Variables: Secure API key management
- Provider Management: Easy enable/disable of providers
- Custom Models: Define your own model aliases and mappings
- Unified Configuration: Single llms.json to configure all providers and models
- Custom model aliases and mappings
- Flexible chat templates
- Environment variable support
Learn more about Configuration →
🌐 OpenAI-Compatible API
- Drop-in replacement for OpenAI API
- Works with any OpenAI-compatible client
- Streaming support
- Custom parameters
📊 Analytics & Monitoring
Track costs, usage, and performance across all providers.
- Cost Tracking: Per-message, per-thread, and monthly cost analytics
- Token Metrics: Input/output token tracking
- Activity Logs: Detailed request history
- Provider Stats: Response times and reliability metrics
🔄 Intelligent Provider Routing
Automatic failover and cost optimization across providers.
- Multi-Provider: Support for 10+ provider types with 160+ models
- Auto-Failover: If one provider fails, automatically try the next
- Cost Optimization: Route to free/cheap providers first
- Model Mapping: Unified model names across providers
🔒 Security
- Optional GitHub OAuth authentication
- User access restrictions
Learn more about GitHub OAuth →