• The workshop will be divided into two parts: a tutorial where attendees will learn about the basics of Natural Language Processing (NLP), and a practice session where attendees will get to analyze a dataset of Moroccan Darija and present their findings.
• NLP is a field that is in high demand, and where research progresses actively and quickly. Whereas language technology for languages like English and French is highly developed, low-resource languages (like most African indigenous languages) have been left behind and marginalized. There are many opportunities to create new tools for languages with few resources. In this tutorial, we take the example of Moroccan Darija, the national vernacular in Morocco. Our use case dataset will be the Moroccan Darija Wikipedia.
• The participants will first learn statistical tools to analyze language in the tutorial. The tutorial will go over NLP notions including text pre-processing and tokenization, n-gram language modeling, n-gram frequency, topic modeling, and word embeddings. The tutorial consists of theoretical definitions and concrete examples in Python. The participants can then move to the practice part of the workshop, in teams of 1 to 5 people. Each team will be given the Moroccan Darija Wikipedia and will work on analyzing the dataset from an angle of their choice. At the end of the workshop, the teams will be invited to show their findings in a short presentation.
- Gain basic knowledge of NLP in the tutorial.
- Practice analyzing text data in the second part of the workshop.
- Practice data analysis and presentation of results.
The workshop is recommended for North African people who aspire to be data scientists, NLP and/or Machine Learning researchers and practitioners, and people interested in computational linguistics.
Participants will have the option to get a certificate of participation for this workshop, including the number of hours of participation.
Beginner level
• Intermediate familiarity with Python
• No prior knowledge of Natural Language Processing or Machine Learning is required
• Familiarity with any North African Darija is recommended
• Computer with Internet, Python 3, and Jupyter Notebook