llms.py
Features

Analytics & Monitoring

Track costs, tokens, and usage across all providers

Overview

The analytics system provides:

  • Real-time Metrics: See costs and tokens for every request
  • Historical Data: Track usage over time
  • Provider Breakdown: Compare costs and performance per provider
  • Activity Logs: Detailed request history

Token Metrics

Per-Message Metrics

Every message shows its token count:

Token Usage

Displayed for each message:

  • Input tokens (user message)
  • Output tokens (AI response)
  • Total tokens

Thread-Level Metrics

At the bottom of each conversation:

  • Total cost
  • Total tokens (input + output)
  • Number of requests
  • Total response time
  • Average tokens per request

Model Selector Metrics

The model selector displays pricing for each model:

  • Input cost per 1M tokens
  • Output cost per 1M tokens
  • Quick comparison between models

Cost Analytics

Monthly Cost Overview

Track your spending day by day:

Cost Analytics

Features:

  • Daily cost breakdown
  • Total monthly spend
  • Expandable details per day
  • Provider and model breakdown

Cost Breakdown

Click any day to see:

  • Cost per provider
  • Cost per model
  • Number of requests
  • Token usage

Token Analytics

Monthly Token Usage

Visualize token consumption over time:

Token Analytics

Shows:

  • Daily token usage
  • Input vs output tokens
  • Total monthly tokens
  • Trends over time

Token Breakdown

Expandable details show:

  • Tokens per provider
  • Tokens per model
  • Average tokens per request
  • Input/output ratio

Activity Logs

Request History

Detailed log of all AI requests:

Activity Log

Each entry includes:

  • Model: Which model was used
  • Provider: Which provider served the request
  • Prompt: Partial preview of the prompt
  • Input Tokens: Tokens in the request
  • Output Tokens: Tokens in the response
  • Cost: Calculated cost for the request
  • Response Time: How long the request took
  • Speed: Tokens per second
  • Timestamp: When the request was made
  • Search by prompt content
  • Filter by date range
  • Filter by provider
  • Filter by model
  • Sort by any column

Data Storage

Separate from Chat History

Analytics data is stored separately from chat conversations:

  • Clearing chat history preserves analytics
  • Deleting analytics preserves chat history
  • Independent export/import

Export Analytics

Hold ALT while clicking the Export button to export analytics data:

{
  "logs": [
    {
      "timestamp": "2025-11-15T10:30:00Z",
      "model": "grok-4-fast",
      "provider": "grok",
      "prompt": "What is...",
      "inputTokens": 10,
      "outputTokens": 150,
      "cost": 0.0024,
      "responseTime": 1.2,
      "speed": 125
    }
  ]
}

Pricing Configuration

Pricing is configured in llms.json per provider:

{
  "providers": {
    "openai": {
      "pricing": {
        "gpt-5": {
          "input": 2.50,
          "output": 10.00
        },
        "gpt-4o": {
          "input": 2.50,
          "output": 10.00
        }
      },
      "default_pricing": {
        "input": 5.00,
        "output": 15.00
      }
    }
  }
}

Pricing is in dollars per 1M tokens.

Free Models

Models from free providers show $0.00 cost:

  • OpenRouter free models
  • Groq free models
  • Google free tier models
  • Local Ollama models

Performance Metrics

Response Time

Track how fast providers respond:

  • Per-request response time
  • Average response time per provider
  • Identify slow providers or models

Speed (Tokens/Second)

Measure generation speed:

  • Output tokens per second
  • Compare model speeds
  • Optimize for faster models when needed

Provider Checking

Test provider connectivity and performance:

# Check all models for a provider
llms --check groq

# Check specific models
llms --check groq kimi-k2 llama4:400b

Provider Check

Shows:

  • ✅ Working models
  • ❌ Failed models
  • Response times
  • Provider availability

Automated Checks

GitHub Actions runs automated provider checks:

  • Tests all configured providers
  • Tests all models
  • Publishes results to /checks/latest.txt
  • Runs on schedule

Use Cases

Cost Optimization

  • Identify expensive models
  • Compare provider costs
  • Route to cheaper alternatives
  • Set budget limits

Performance Monitoring

  • Find fastest providers
  • Identify slow models
  • Optimize for speed vs cost
  • Detect provider issues

Usage Analysis

  • Track which models you use most
  • See token consumption patterns
  • Identify heavy usage periods
  • Plan capacity needs

Debugging

  • Review failed requests
  • Check response times
  • Verify token counts
  • Audit provider usage

Best Practices

Cost Management

  1. Use Free Tiers First: Enable free providers first in llms.json
  2. Monitor Daily Spend: Check analytics regularly
  3. Set Budget Alerts: Keep track of monthly costs
  4. Choose Appropriate Models: Use cheaper models for simple tasks

Performance Optimization

  1. Check Provider Status: Run --check periodically
  2. Monitor Response Times: Identify slow providers
  3. Balance Speed vs Cost: Choose based on needs
  4. Use Local Models: For privacy-sensitive or high-volume tasks

Data Hygiene

  1. Export Regularly: Backup analytics data
  2. Clean Old Logs: Remove outdated entries
  3. Separate Environments: Use different ports for different use cases

Privacy

Analytics data is stored locally:

  • ✅ No external tracking
  • ✅ No data sent to third parties
  • ✅ Full control over data
  • ✅ Easy to delete or export

Next Steps