Text-to-speech
Kyutai text-to-speech started as an internal tool we used during the development of Moshi. As part of our commitment to open science, we've since open-sourced two text-to-speech models:
- Kyutai Pocket TTS, a tiny model with voice cloning, fast enough to run on CPU.
- Kyutai TTS 1.6B, a streaming model used in Unmute, great for servers.
Kyutai Pocket TTS
Released in January 2026, Kyutai Pocket TTS is our newest model. With a mere 100 million parameters, it's lightweight enough to run on a CPU in real-time.
The demo below does run on a remote CPU, but it only takes a single command to run it on your own machine. Check out the technical report for instructions.
Kyutai Pocket TTS
Kyutai TTS 1.6B
Released in July 2025, Kyutai TTS 1.6B is a model based on our research on delayed streams modeling. This technique allows the model to start generating audio before the entire text input is available, ideal for low-latency applications such as voice assistants.
Kyutai TTS 1.6B
To try out the TTS in an interactive real-time way, check out Unmute.
Want more voices? You can anonymously donate your voice to be added to the voice repository for use with Kyutai TTS.