Exploring the Frontier: The Challenges and Opportunities in Building Kenya's National Language Corpus

Talk / Overview

In the last two years, a team of researchers has taken the initiative to develop KenCorpus, an impressive open-source collection of textual and spoken data in three prominent Kenyan languages: Swahili, Dholuo, and Luhya. Furthermore, other individuals have taken the lead in gathering a Swahili dataset through Mozilla's Common Voice (MCV) platform, utilizing crowdsourcing. These datasets serve as fundamental resources for developers who aim to build applications like chatbots or automatic translation services. But how do you make use of such datasets as an aspiring NLP developer? How can such a repository further grow with and for communities?

Talk / Speakers

Mark Irura Gachara

Technical Advisor - FAIR forward - Artificial intelligence for all

AMLD / Global partners

AMLD Africa 2024 Exploring the Frontier: The Challenges and Opportunities in Building Kenya's National Language Corpus

Talk / Overview

Talk / Speakers

Mark Irura Gachara

AMLD / Global partners

Get tickets

AMLD EPFL 2024

AMLD Africa 2024

AMLD Generative AI