⚠️ A valid COVID certificate must be presented on site to enter the event. ⚠️
Machine Learning (ML) is having a huge impact in the automation of tedious and repetitive tasks in several industries. In this workshop, we take the example of the digitalization of paper documents to show how standard ML techniques can have a big impact in this context.
Many institutions deal every day with a large number of paper documents (invoices, vouchers, …). These documents are often treated by employees with a high business knowledge and entered manually in a database or in another type of data storage. This is clearly an inefficient way to use resources. Therefore, the automation of this task is a priority to many institutions.
The goal of this workshop is to show you how ML can be used in the automation of such a process. You will see how to implement algorithms to:
- Classify scanned documents (using a CNN model implemented with Keras)
- Detect the position of a few fields within the document (using a bounding box regression, implemented in Keras)
- Extract the information from these fields (using Tesseract)
- Use the human feedback to improve the system performance (human-in-the-loop or HITL)
The notebooks and slides are available here.
This workshop will give the participants a simple and intuitive introduction to image recognition, object detection and a few key aspects of a HITL feedback system.
Participants will experience first-hand the potential and limitations of these standard computer vision techniques, when applied to the digitalization of scanned documents.
Participants will build, modify and play with these algorithms and explore the role of ML in the automation a tedious and repetitive task with simple algorithms.
Beginner level
- Basic understanding of supervised learning
- Beginner level in a programming language (Python, VBA, C++, R, or Matlab for example)
- Own laptop with modern browser