Kyutai text-to-speech started as an internal tool we used during the development of
Moshi.
As part of our commitment to open science, we've since open-sourced two text-to-speech models:
Kyutai Pocket TTS, a tiny model with voice cloning, fast enough to run on CPU.
Released in January 2026, Kyutai Pocket TTS is our newest model.
With a mere 100 million parameters, it's lightweight enough to run on a CPU in real-time.
The demo below does run on a remote CPU, but it only takes a single command to run it on your own machine.
Check out the technical report for instructions.
Released in July 2025, Kyutai TTS 1.6B is a model based on our research on delayed streams modeling.
This technique allows the model to start generating audio before the entire text input is available,
ideal for low-latency applications such as voice assistants.
Kyutai TTS 1.6B
expresso/ex03-ex01_laughing_001_channel1_188s.wav
Not connected
To try out the TTS in an interactive real-time way, check out
Unmute.
Want more voices? You can anonymously donate your voice
to be added to the voice repository for use with Kyutai TTS.