AI in African languages ​​to bridge the digital divide

 

AI in African languages ​​to bridge the digital divide

Initiatives are underway in South Africa, Kenya, and Nigeria to create more AI tools in African languages. Researchers have already recorded over 9,000 hours of local speech to expand access to this revolutionary technology across the continent.


As the use of artificial intelligence (AI)-based language models explodes, with many AI-powered websites integrated into daily life, some in Africa fear that their populations will be excluded from predominantly Westernized software.


Despite Africa's linguistic richness, where more than a quarter of the world's languages ​​are present according to some estimates, this diversity is still poorly represented in the development of AI.

The main reason? A lack of investment and accessible data. The majority of AI tools, such as ChatGPT, are trained on texts in English or other European and Chinese languages, which have huge online databases.


However, many African languages, primarily oral, lack sufficient written texts to support machine learning, thus limiting their integration into these technologies. As a result, millions of people are left behind.


A major breakthrough: a single dataset

Researchers have recently published what is believed to be the largest linguistic dataset for African languages. "We think in our languages, dream in them, and interpret the world through them. If technology does not reflect this reality, a portion of the population will be left behind," explains Professor Vukosi Marivate of the University of Pretoria, who participated in the project.


The "African Next Voices" initiative brought together linguists and computer scientists to create AI-ready datasets in 18 African languages. While this covers only a fraction of the more than 2,000 languages ​​spoken on the continent, participants hope to expand this database in the future.


Over two years, they recorded 9,000 hours of speech in everyday contexts related to agriculture, health, and education in Kenya, Nigeria, and South Africa. Languages ​​recorded included Kikuyu and Dholuo in Kenya, Hausa and Yoruba in Nigeria, and IsiZulu and Tshivenda in South Africa.


“You need a starting point. That’s what African Next Voices is doing. Then, others can build on it and innovate,” explains Professor Marivate. The data collection was made possible thanks to a $2.2 million grant from the Gates Foundation. The data will be freely accessible to allow developers to create translation, transcription, or interaction tools in African languages.


Concrete innovations

Several examples already illustrate the potential impact of these technologies. In Rustenburg, South Africa, farmer Kelebogile Mosime uses an application called AI-Farmer. It recognizes several local languages, such as Sotho, isiZulu, and Afrikaans, to help her with her farming tasks.


“As a novice farmer, I face many challenges,” she explains. “Thanks to this app, I can ask questions in Setswana, my native language, and get helpful answers. This is invaluable to me, especially in rural areas where access to technology is limited.”


Another South African startup, Lelapa AI, designs tools in African languages ​​for banks and telecom companies. Its CEO, Pelonomi Moiloa, laments that current options are very limited. “English is the language of opportunity. For many, not speaking it can mean missing out on essential services like healthcare, banking, or government assistance. We want to change that,” she says.


According to Professor Marivate, these initiatives are not merely economic or practical issues, but also concern social justice. Without a concerted effort to support local languages, some fear that linguistic inequalities will worsen, leaving a significant portion of the African population behind in the face of digital advancements.


Post a Comment

Previous Post Next Post

Translate