This ASR actually handles 52 languages

Author(s): Gautam Boyina

Originally published on Towards AI.

And the forced alignment model is the interesting part

I have tested dozens of speech recognition models over time. Most claim multilingual support, but quietly break down when you give them actual Chinese dialects, accented English, or anything beyond standard broadcast audio. The ones that work well are usually proprietary APIs that cost inconveniently.

This ASR actually handles 52 languages

quen-asr from github

Alibaba’s Qween team has introduced Qween3-ASR, an open-source speech recognition system supporting 52 languages and dialects. Key models include Qwen3-ASR-1.7B, which claims state-of-the-art performance for multilingual tasks, and Qwen3-ForcedAligner-0.6B, a non-autoregressive model for accurate speech-text alignment. These developments allow for better handling of Chinese dialects, user-generated content in multiple languages, and increased timestamp accuracy for applications requiring precise audio-text synchronization.

Read the entire blog for free on Medium.

Published via Towards AI

This ASR actually handles 52 languages

Author(s): Gautam Boyina

And the forced alignment model is the interesting part

5 Generalization Techniques: Why Standardizing Activations Transforms Deep Learning

AI agent stuck in terminal

Related Articles

Leave a Comment Cancel Reply