Mistral launches new speech-to-text AI models

by
0 comments
Mistral launches new speech-to-text AI models

French AI startup Mistral has released a pair of new speech-to-text models that aim to set new standards for speed, privacy, and affordability.

Paris based seller Voxtral Mini Transcribe V2 and Voxtral Realtime were unveiled earlier this month, both under the umbrella of Voxtral Transcribe 2.

According to Mistral, the models take a major step forward, offering “state-of-the-art transcription quality, diarization and ultra-low latency.”

The company has high hopes that the tools will prove popular among enterprise customers, with the number of potential applications growing all the time – from virtual assistants to call center automation and broadcast subtitling to compliance documentation.

Each model is designed for different applications.

As the name suggests, RealTime is designed to process live audio, providing transcription with negligible delay that can be configured down to 200 milliseconds. This functionality is provided with what is described as a “novel streaming architecture”, which gives it advantages over approaches that adapt offline models and process audio piecemeal.

Connected:AI Lawsuits in 2026: Settlements, Licensing Deals, Litigation

Delay can be configured as required. At 2.4 seconds, the tool is considered ideal for subtitling, while at 480 milliseconds, the error rate is so low – around 1-2%, which is close to offline accuracy – that it can be used for voice agents, according to Mistral.

This model is also natively multilingual. It works in 13 languages ​​(English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch), while with only four billion parameters, it can run on local devices like phones or laptops. This is especially important for deployments in which privacy and security are important.

Realtime is available under the Apache 2.0 open-source license on Hugging Face Hub or via API at $0.006 per minute.

Mini Transcribe 2, meanwhile, handles batch transcription of pre-recorded audio files, offering a range of features including comprehensive speaker diarization (with labels and start/end times), context bias for dedicated topics and domains, and timestamps for specific words. Up to three hours of recording can be processed in a single request, and the same 13 languages ​​are supported in realtime.

Mistral says what really makes Mini Transcribe 2 stand out is its affordability, with its 4% word error rate on the FLEURS transcription benchmark and a cost of $0.003/minute claimed to provide the best price-performance of any transcription API.

Connected:Enterprises don’t care about Anthropic’s Super Bowl ad

The company is inviting potential customers to try it in a new way audio playground In its Mistral Studio, or on its Le Chat Assistant.

This release marks another step forward for Mistral, which has emerged as a leading European player in the growing AI landscape. secured 2 billion dollars In new funding last year.

Related Articles

Leave a Comment