Media Generation
Audio Generation
Generate audio using various LLM providers
Audio generation is an emerging capability with limited provider support where Text-to-Speech generation through both the UI and CLI, currently only supports Google's latest TTS models:
| Model | Description |
|---|---|
| Gemini 2.5 Flash Preview TTS | Fast, lightweight TTS |
| Gemini 2.5 Pro Preview TTS | High-quality TTS |
Typically you'd select the audio generation model from the Model Selector to find models that supports audio generation:
But despite models.dev listing them as capable of audio generation, only Gemini's TTS models are currently supported for audio generation through Gemini's API as Alibaba doesn't yet support the audio modality.
UI & Command-Line Usage
Available in both the UI and on the command-line using --out audio:
llms --out audio "Merry Christmas"
llms -m gemini-2.5-pro-preview-tts --out audio "Merry Christmas"Output
Audio files are saved locally and accessible via HTTP URL:
Saved files:
/Users/llmspy/.llms/cache/c2/c27b5fd43ebbdbca...acf118.wav
http://localhost:8000/~cache/c2/c27b5fd43ebbdbca...acf118.wavPlayback
From the command line:
play /Users/llmspy/.llms/cache/c2/c27b5fd43ebbdbca...acf118.wavINFO
📁 All generated audio files are saved to
~/.llms/cache using their SHA-256 hash as the filename.