Effortlessly convert your speech to text directly within Raycast using the power of whisper.cpp
. This extension provides a simple interface to record audio, transcribe and refine it locally, privately on your machine. Refine the text with custom prompts privately using ollama, or additionally with Raycast AI or any v1 (OpenAI) compatible API.
whisper.cpp
running locally on your machine through Raycast.Before installing the extension, you need the following installed and configured on your system:
whisper.cpp
: You must install whisper-cpp.
brew install whisper-cpp
Download Whisper Model
extension command. This will configure the model's path automatically.ggml-{model}.bin
) and point the extension to it's path in preferences.sox
: This extension uses the SoX (Sound eXchange) utility for audio recording.
brew install sox
*The extension currently default for sox
to be at /opt/homebrew/bin/sox
. If yours is installed somewhere else, point the extension to it's executable in preferences.This ectension is now available to download from the Raycast Store. However if you'd prefer to build from source see below
After installing, you have to configure the extension preferences in Raycast, if you installed both SoX and whisper-cpp using homebrew, and download a model using the extension this should all be pre-configured for you, the extension will also confirm both SoX and whisper-cli path on first launch which will allow you to immediately start using simple dictation once configured:
⌘ + ,
).Extensions
.whisper.cpp
executable (e.g., /path/to/your/whisper.cpp/build/bin/whisper-cli
)./usr/local/bin/whisper-cpp
.bin
model file (e.g., /path/to/your/whisper.cpp/models/ggml-base.en.bin
)./usr/local/bin/sox
Paste Text
: Pastes the text into the active application.Copy to Clipboard
: Copies the text to the clipboard.None (Show Options)
: Shows the transcribed text in Raycast with manual Paste/Copy actions (Default).Configure AI Refinement
by default.Download Whisper Model
command and choose the model you would like to download with Enter
.Enter
if you have multiple models downloaded.Ctrl+X
Dictate Text
command. The extension window will appear, showing a "RECORDING AUDIO..." message and a waveform animation. Start speaking clearly.
Enter
when you are finished speaking.⌘ + .
or click "Cancel Recording" to abort.whisper.cpp
processes the audio. This may take a few seconds depending on the audio length and model size.Paste Text
: Pastes the content.Copy Text
(⌘ + Enter
): Copies the content.Preferences
Close
(Esc
): Closes the Raycast window.Dictation History
anytime you need a past transcription. It currently stores up to 100.
Ctrl+X
Ctrl+Shift+X
for a fresh start.Automate the formatting/style of your transcriptions by refining them using AI. This feature can reformat text, correct grammar, or apply custom instructions based on your needs.
How it Works:
Configure AI Refinement
and press Enter
.http://localhost:11434
, https://api.openai.com
).llama3.2:latest
, gpt-4o-latest
). If using Ollama, make sure this model is pulled and available: ollama ls
.Configure AI Refinement
command to manage how the AI refines your text.Using the Configure AI Refinement
Command:
This command allows you to customize the instructions given to the AI:
Ctrl-E
).Ctrl + X
).When AI refinement is enabled, after the initial transcription, the text will be sent to your chosen AI along with the active prompt. The refined text will then be handled the same as regularly transcribed text, and stored in your dictation history.
The extension downloader currently supports the following whisper models, however you can download any model you might need from ggervanov/whisper.cpp and configure it's path in the extension's preferences:
tiny.en
, 78 MB) - Smallest, speediest, least accurate however optimised for english languagebase.en
, 148 MB) - Small and speedy, same size as base
but more accurate if just transcribing in englishsmall.en
, 488MB) - Optimised for english, slightly larger and more accurate than base while not consuming too many resourcesmedium.en
, 1.53 GB - Slightly larger again, optimised for english, transcriptions will be slower than above and consume more resources, but will be more accuratetiny
, 78 MB) - Smallest, speediest, least accuratebase
, 148 MB) - Small, speedy and multilingualsmall
, 488 MB) - Still pretty speedy and multilingualmedium
, 1.53 GB) - Slower, more accurate and multilingual.large-v3
, 3.1 GB) - The largest, slowest, most accurate model available. Use only if you have a powerful computer or a lot o time on your hands, especially for longer transcriptions.large-v3-turbo
, 1.62GB) - Based on the large model but much faster at the cost of accuracy. Has a chance to begin repeating itself on longer transcriptions.sox
is installed correctly (brew install sox
)./opt/homebrew/bin/sox
is correct for your installation. If not, you may need to edit dictate.tsx
or create a symlink.sox
manually) has microphone permissions in System Settings > Privacy & Security > Microphone..bin
file.main
binary).whisper.cpp
.Developer Tools
> Show Extension Logs
).npm install
and npm run build
in the extension directory.ollama ls
to check installed models with ollama, or find external API models in their documentationThis project is licensed under the MIT License - see the LICENSE file for details (or state MIT directly if no file exists).
whisper.cpp
project.