Text-to-speech

Kyutai text-to-speech started as an internal tool we used during the development of Moshi. As part of our commitment to open science, we've since open-sourced two text-to-speech models:

Kyutai Pocket TTS

Released in January 2026, Kyutai Pocket TTS is our newest model. With a mere 100 million parameters, it's lightweight enough to run on a CPU in real-time.

The demo below does run on a remote CPU, but it only takes a single command to run it on your own machine. Check out the technical report for instructions.

Kyutai Pocket TTS

Kyutai logo
You can also clone the voice from any audio sample by using our repo. You can find more voices in our voices repository. We recommend cleaning the sample before using it with Pocket TTS.

Kyutai TTS 1.6B

Released in July 2025, Kyutai TTS 1.6B is a model based on our research on delayed streams modeling. This technique allows the model to start generating audio before the entire text input is available, ideal for low-latency applications such as voice assistants.

Kyutai TTS 1.6B

Kyutai logo
Not connected

To try out the TTS in an interactive real-time way, check out Unmute.

Want more voices? You can anonymously donate your voice to be added to the voice repository for use with Kyutai TTS.