A large-scale open resource for African language speech technology

by
0 comments
Helping AI have long-term memory

Anchoring the African AI ecosystem

Crucial to the WAXAL project was our commitment to work with and contribute directly to the African AI ecosystem. The data collection effort was led entirely by African academic and community organizations, guided by Google experts on world-class data collection practices. This collaborative approach ensures that the fund is built by and for the community it serves; Each partner focused on a specific subgroup of languages, with a shared methodology. Our partners include Makerere Universitywhich collected ASR and/or TTS data for nine different languages, and University of Ghanawhich focused its efforts on eight languages, using the ASR image-inspired data collection method mentioned above. Additional key collaborators were digital umugandain partnership with Addis Ababa UniversityWho played an important role in leading the ASR collection for many regional languages. For high-quality, studio-recorded voices, media trust, loud and clear And African Institute for Mathematical Sciences Senegal Led TTS recording in different regional languages.

This framework is fundamentally based on the principle that our partners retain ownership of the data they collect in exchange for a shared commitment to make all datasets openly available to the broader community. This deep collaboration and open access philosophy has already enabled notable derivative research and publications.

  • Through this framework, our partners have already enabled new research, such as the development of cookbook For community-driven collection of impaired speech. The result of this research was the first open-source dataset For Akan Speakers with conditions such as cerebral palsy and stuttering demonstrated that individually, image-cued prompting is more effective than text-based prompts for these populations. This work provides an important roadmap for developing inclusive speech technologies in low-resource environments.
  • In addition, the initiative supported a major Study who presented 5,000 hours of speech corpus For the five Ghanaian languages ​​– Akan, Ewe, Dagbani, Dagare, and Ikoposo. This work established the infrastructure for building robust ASR and TTS systems tailored to the linguistic diversity of West Africa by using a controlled crowdsourcing approach to capture natural, spontaneous vocalizations.
  • other essentials Research The focus is on benchmarking four state-of-the-art models (whisper, XLS-R, mmsAnd W2v-BERT) in 13 African languages. This study analyzed how performance increases with increased training data, providing important insights into data efficiency and highlighting that scaling benefits are strongly dependent on linguistic complexity and domain alignment.
  • Finally, a systematic literature review was published listing 74 datasets in 111 African languages ​​to map the current extent of speech technology. This review emphasized the urgent need for multi-domain conversational corpora and the adoption of linguistically informed metrics, such as character error rate (CER), to better evaluate performance in morphologically rich and tonal language contexts.

Related Articles

Leave a Comment