Talk / Overview

While being so diverse and rich, Arabic language and particularly Arabic dialects are still under represented and not yet fully exploited by deep learning because of the lack of available data. In this presentation, we introduce a methodology of collecting, cleaning and preprocessing Tunisian dialect data in order to create a language model and fine-tuning it for different NLP tasks.