Audio Generation

Audio generation is an emerging capability with limited provider support where Text-to-Speech generation through both the UI and CLI, currently only supports Google's latest TTS models:

Model	Description
Gemini 2.5 Flash Preview TTS	Fast, lightweight TTS
Gemini 2.5 Pro Preview TTS	High-quality TTS

Typically you'd select the audio generation model from the Model Selector to find models that supports audio generation:

But despite models.dev listing them as capable of audio generation, only Gemini's TTS models are currently supported for audio generation through Gemini's API as Alibaba doesn't yet support the audio modality.

UI & Command-Line Usage

Available in both the UI and on the command-line using --out audio:

llms --out audio "Merry Christmas"
llms -m gemini-2.5-pro-preview-tts --out audio "Merry Christmas"

Output

Audio files are saved locally and accessible via HTTP URL:

Saved files:
/Users/llmspy/.llms/cache/c2/c27b5fd43ebbdbca...acf118.wav
http://localhost:8000/~cache/c2/c27b5fd43ebbdbca...acf118.wav

Playback

From the command line:

play /Users/llmspy/.llms/cache/c2/c27b5fd43ebbdbca...acf118.wav

INFO

📁 All generated audio files are saved to ~/.llms/cache using their SHA-256 hash as the filename.

UI & Command-Line Usage

Output

Playback

On this page