Voice Input
Adds voice-to-text transcription to the chat UI via a microphone button or ALT+D keyboard shortcut.
The voice extension supports three transcription modes tried in order: voxtype, transcribe, and voxtral-mini-latest, using the first one that's available.
To remove modes or change their priority, override with the LLMS_VOICE environment variable, e.g:
export LLMS_VOICE="transcribe,voxtral-mini-latest"Usage
🎤 Microphone Button
Click the microphone icon in the chat input area to start recording. Click again to stop and transcribe.
If the voice extension is enabled the microphone button will appear in the chat input area, and the ALT+D keyboard shortcut will be available for voice input.
Keyboard Shortcut
Alt+D toggles voice recording with two modes:
- Tap (< 500ms): Toggle mode - starts recording, press again to stop
- Hold (≥ 500ms): Push-to-talk - records while held, stops when released
The transcribed text is appended to the current message input.
Voice input can be disabled by disabling the voice extension or by setting LLMS_VOICE="" to disable all modes.
Available Modes
Voice Input will use the first available mode.
voxtype
Uses the voxtype.io CLI tool for local transcription.
Requirements:
voxtypemust be installed and on your$PATHffmpegmust be installed for audio format conversion
Installation
Voxtype works on GNOME, KDE, Sway, Hyprland, River—Wayland or X11 with native packages for Arch Linux, Debian, Ubuntu, Fedora and support for macOS via their source builds.
transcribe
Use your preferred speech-to-text tool by creating a custom transcribe script or executable.
Requirements:
- A
transcribeexecutable on your$PATHthat accepts an audio wav file and outputs text to stdout ffmpegmust be installed for audio format conversion
Interface:
transcribe recording.wav > transcript.txtSee Creating a transcribe Script for implementation examples.
voxtral-mini-latest
Uses Mistral's Voxtral model for cloud-based transcription. A good option if you want to avoid downloading a large model and using local CPU resources.
Requirements:
- Mistral provider must be enabled in your configuration
MISTRAL_API_KEYenvironment variable must be set
Pricing: ~$0.003/minute
Creating a transcribe Script
Make the script executable and add it to your $PATH:
chmod +x ./transcribe
sudo ln -s $(pwd)/transcribe /usr/local/bin/transcribeUsing OpenAI Whisper
Create a script using uvx and openai-whisper:
./transcribe
#!/usr/bin/env bash
uvx --from openai-whisper whisper "$1" --model base.en --output_format txt --output_dir /tmp >/dev/null 2>&1
BASENAME=$(basename "${1%.*}")
cat "/tmp/${BASENAME}.txt"
rm -f "/tmp/${BASENAME}.txt"Using Whisper.cpp
whisper.cpp provides a faster, dependency-free C++ implementation.
Setup:
git clone https://github.com/ggml-org/whisper.cpp.git
cd whisper.cpp
# Download a model
sh ./models/download-ggml-model.sh base.en
# Build
cmake -B build
cmake --build build -j --config Release
# Test
./build/bin/whisper-cli -f samples/jfk.wavCreate the transcribe script:
./transcribe
#!/usr/bin/env bash
SCRIPT_DIR="$(cd "$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")" && pwd)"
MODEL="$SCRIPT_DIR/models/ggml-base.en.bin"
CLI="$SCRIPT_DIR/build/bin/whisper-cli"
TMPFILE=$(mktemp /tmp/whisper-XXXXXX)
trap 'rm -f "$TMPFILE" "${TMPFILE}.txt"' EXIT
"$CLI" -m "$MODEL" -otxt -f "$1" -of "$TMPFILE" >/dev/null 2>&1
cat "${TMPFILE}.txt"