kyutai: open-science AI lab

Blog

The FID Lottery2026-06-18Quantifying hidden randomness in generative-model evaluation. Every reported FID is the outcome of two lotteries.
Post-training speech models for better interactivity2026-06-10Post-training full-duplex spoken dialogue models with RL for improved interactivity
Understanding Data Temporality Impact on LLM pre-training2026-05-26Benchmarking LLMs on questions whose answers change with time.
Introducing KE:SAI2026-05-20A new open-science lab dedicated to physical AI, created together with the ELLIS Institute Tübingen.
Pocket TTS now supports six languages2026-05-04Pocket TTS now speaks English, French, German, Spanish, Portuguese, and Italian.
MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models2026-04-30Helping Moshi answer tough questions with help from a text LLM.
ARC-Encoder: learning compressed text representations for LLMs2026-04-28Speeds up RAG, extends the context window, and transfers between LLMs.
OVIE: One View Is Enough!2026-04-14OVIE generates novel views from a single photograph, trained on unpaired internet images.
Invincible Voice online demo released2026-02-24An AI dialogue assistant designed to help people living with ALS communicate more effectively.
Hibiki-Zero: Simultaneous Speech-to-Speech Translation Without Aligned Data2026-02-12Real-time speech translation from four languages.
Pocket TTS: a high-quality TTS with voice cloning that runs on CPU2026-01-13A tiny TTS with any voice you like.
Neural audio codecs: how to get audio into LLMs2025-10-21Why modeling audio is harder than text, and how to make it feasible with neural audio codecs.
Kyutai TTS and Unmute now open-source2025-07-03Announcing the open-source release of Kyutai TTS 1.6B and Unmute, with benchmarks and project details.
Kyutai TTS 1.6B2025-07-03The nitty-gritty of Kyutai TTS 1.6B, our text-to-speech model.
Kyutai Speech-To-Text released as open-source2025-06-19Announcing the open-source release of Kyutai STT, the streaming speech-to-text model powering Unmute.
Unmute: Make LLMs listen and speak2025-05-22Modular voice AI that empowers any text LLM with real-time speech-to-text and text-to-speech.
Helium 1: a modular and multilingual LLM2025-04-30Announcing Helium 1: a 2B parameter modular and multilingual language model, open-sourced for reproducibility.
MoshiVis: Teaching Moshi to Converse about Images2025-03-21An open-source Vision Speech Model with low-latency and natural conversation skills.
Simultanenous, on-device, high fidelity speech-to-speech translation with Hibiki2025-02-10Announcing Hibiki: simultaneous, on-device, high fidelity speech-to-speech translation.
Announcing Helium-1 Preview2025-01-13Preview release of Helium-1, a lightweight multilingual language model for edge and mobile devices.
Moshi open-source release: run Moshi locally!2024-09-18Release announcement and technical details for Moshi, Helium, and Mimi.
Meet Moshi, the first real-time voice AI2024-07-03Introducing Moshi, a real-time voice AI that brings expressive, spontaneous spoken interaction to machines.
Hello Kyutai!2023-11-17Announcing Kyutai, a Paris-based non-profit AI research lab dedicated to open science.