A new open source AI model trained on the languages and cultures of Latin America has been introduced by the Andean nation of Chile.
Under a program coordinated by CENIA, Chile’s National Center of Artificial Intelligence, LATAM-GPT was developed by scientists, researchers and professionals from more than 60 institutions from 15 different Latin American and Caribbean countries.
Chile’s Ministry of Science, Technology, Knowledge and Innovation, AWS, and the Latin America and Caribbean Development Bank were also participating in this effort.
This model was built with language, data, and context specific to Latin America and the Caribbean, Amid growing unrest globally About the current dominance of large US technology vendors and the rapidly growing AI field sovereign ai Agitation.
“Unlike models trained primarily with information in English and cultural frameworks from the Global North, LATAM-GPT understands the cultural and linguistic nuances as well as the historical and political contexts of Latin America,” according to a senia release. CENIA launched the model at an event in Santiago on 10 February, with CENIA director Alvaro Soto saying that LATAM-GPT enables Latin America “to join the AI revolution as a major player”.
He was supported by the country’s Science Minister, Aldo Valle, who said: “This project stems from the conviction that regional integration is the only realistic path to achieving technological sovereignty with a democratic objective.” Also present was Chilean President Gabriel Boric, who welcomed the model’s release. a post on x.
The need for a technology like LATAM-GPT seems clear, given that research has shown that data in Spanish, the language used by the majority of people in Latin America, accounts for only about 4% of the data used to train language models so far. Portuguese, the native language of Brazil, accounts for only 2% of the training data.
Spanish and Portuguese were used for the main content of LatamGPT, and the project aims to include indigenous languages as well.
The model was developed on the base architecture of META Llama 3.1 Open model with 70 billion parameters, and trained on officially approved texts, obtained with permission.
In total, more than 300 billion plain-text tokens – equivalent to approximately 230 billion words – were collected under the license and claimed to be a “high-quality dataset”, according to CENIA.
However, LatamGPT’s ability to penetrate an AI market dominated by a few American and Chinese companies may be limited. As the AP reports, it was developed on a budget of only $550,000.
However, the availability of LATAM-GPT is ongoing hugging face And GitHub indicates that some see it as a useful basic infrastructure for those wishing to develop future regional applications.
