Workshop repo: https://github.com/mar-muel/artificial-self-AMLD-2020
Note: In order to use your own chat logs for the workshop, we highly recommend that you download them in advance as this process can take some time. Follow the guide here: https://github.com/MasterScrat/Chatistics#exporting-your-chat-logs
As an alternative to using your own chat logs, we will provide other conversational datasets.
Fine-tuning large language models, such as OpenAI’s GPT-2, and making them generate tweets, poetry or even entire novels has become surprisingly simple. By feeding suitable textual data, these models do not only learn vocabulary and rules encoded in the input, but also learn to imitate the writing style and characteristics of the author.
Generating arbitrary text is interesting in itself - however we want to be able to interact with the model and get reasonable answers to our questions. For this to work, we also need to teach the model how conversations work. At the end of this workshop, you will be able to build a conversational AI system you can talk with.
Participants will learn the basics of language model fine-tuning for text generation and how to build a conversational AI system. Participants will be exposed to the popular PyTorch transformers library by HuggingFace.
Intermediate level
- Basic Python & Machine Learning understanding
- Own laptop
- Google account (we will use Google Colab)