This ASR actually handles 52 languages

by
0 comments
This ASR actually handles 52 languages

Author(s): Gautam Boyina

Originally published on Towards AI.

And the forced alignment model is the interesting part

I have tested dozens of speech recognition models over time. Most claim multilingual support, but quietly break down when you give them actual Chinese dialects, accented English, or anything beyond standard broadcast audio. The ones that work well are usually proprietary APIs that cost inconveniently.

This ASR actually handles 52 languages

quen-asr from github

Alibaba’s Qween team has introduced Qween3-ASR, an open-source speech recognition system supporting 52 languages ​​and dialects. Key models include Qwen3-ASR-1.7B, which claims state-of-the-art performance for multilingual tasks, and Qwen3-ForcedAligner-0.6B, a non-autoregressive model for accurate speech-text alignment. These developments allow for better handling of Chinese dialects, user-generated content in multiple languages, and increased timestamp accuracy for applications requiring precise audio-text synchronization.

Read the entire blog for free on Medium.

Published via Towards AI


Get your free agent cheatsheet here. Our proven framework for choosing the right AI architecture.
3 years of practical work with real clients in 6 pages.

Take our 90+ lessons from Beginner to Advanced LLM Developer Certification: This is the most comprehensive and practical LLM course, from choosing a project to deploying a working product!

Find your dream AI career at Towards AI Jobs

Towards AI has created a job board specifically tailored to machine learning and data science jobs and skills. Our software searches for live AI jobs every hour, labels and categorizes them and makes them easily searchable. Search over 40,000 live jobs on AI Jobs today!

Comment: The content represents the views of the contributing authors and not those of AI.


Related Articles

Leave a Comment