Text-to-speech

Kyutai text-to-speech started as an internal tool we used during the development of Moshi. As part of our commitment to open science, we've since open-sourced two text-to-speech models:

Kyutai Pocket TTS, a tiny model with voice cloning, fast enough to run on CPU.
Kyutai TTS 1.6B, a streaming model used in Unmute, great for servers.

Kyutai Pocket TTS

Released in January 2026, Kyutai Pocket TTS is our newest model. With a mere 100 million parameters, it's lightweight enough to run on a CPU in real-time. In April, we made it speak five other languages.

The demo below does run on a remote CPU, but it only takes a single command to run it on your own machine. Check out the technical report for instructions.

Kyutai Pocket TTS

Show all voices available in the voices repository

You can also clone the voice from any audio sample when you run the model locally.

Kyutai TTS 1.6B

Released in July 2025, Kyutai TTS 1.6B is a model based on our research on delayed streams modeling. This technique allows the model to start generating audio before the entire text input is available, ideal for low-latency applications such as voice assistants.

Kyutai TTS 1.6B

Show all voices available in the voices repository

Not connected

To try out the TTS in an interactive real-time way, check out Unmute.

Want more voices? You can anonymously donate your voice to be added to the voice repository for use with Kyutai TTS.