llms.py
Features

Analytics

Track costs, tokens, and monitor usage across all providers

Overview

The analytics built-in extension enables the UI to provide a comprehensive analytics look into your request usage to provide detailed insights into your AI usage across all configured providers and models.

The analytics system provides:

  • Real-time Metrics: See costs and tokens for every request
  • Historical Data: Track usage over time
  • Provider Breakdown: Compare costs and performance per provider
  • Activity Logs: Detailed request history

Token Metrics

Per-Message Metrics

Every message shows its token count:

Displayed for each message:

  • Input tokens (user message)
  • Output tokens (AI response)
  • Total tokens

Thread-Level Metrics

At the bottom of each conversation:

  • Total cost
  • Total tokens (input + output)
  • Number of requests
  • Total response time
  • Average tokens per request

Model Selector Metrics

The model selector displays pricing for each model:

  • Input cost per 1M tokens
  • Output cost per 1M tokens
  • Quick comparison between models

Cost Analytics

Monthly Cost Overview

Track your spending day by day:

Features:

  • Daily cost breakdown
  • Total monthly spend
  • Expandable details per day
  • Provider and model breakdown

Cost Breakdown

Click any day to see:

  • Cost per provider
  • Cost per model
  • Number of requests
  • Token usage

Token Analytics

Monthly Token Usage

Visualize token consumption over time:

Shows:

  • Daily token usage
  • Input vs output tokens
  • Total monthly tokens
  • Trends over time

Token Breakdown

Expandable details show:

  • Tokens per provider
  • Tokens per model
  • Average tokens per request
  • Input/output ratio

Activity Logs

Request History

Detailed log of all AI requests:

Each entry includes:

  • Model: Which model was used
  • Provider: Which provider served the request
  • Prompt: Partial preview of the prompt
  • Input Tokens: Tokens in the request
  • Output Tokens: Tokens in the response
  • Cost: Calculated cost for the request
  • Response Time: How long the request took
  • Speed: Tokens per second
  • Timestamp: When the request was made
  • Search by prompt content
  • Filter by date range
  • Filter by provider
  • Filter by model
  • Sort by any column

Data Storage

Separate from Chat History

Analytics data is stored separately from chat conversations:

  • Clearing chat history preserves analytics
  • Deleting analytics preserves chat history

Provider Checking

Test provider connectivity and performance:

# Check all models for a provider
llms --check groq

# Check specific models
llms --check groq kimi-k2 llama4:400b

Shows:

  • ✅ Working models
  • ❌ Failed models
  • Response times
  • Provider availability

Automated Checks

GitHub Actions runs automated provider checks:

  • Tests all configured providers
  • Tests all models
  • Publishes results to /checks/latest.txt
  • Runs on schedule

Next Steps