Workshop / Overview

⚠️ A valid COVID certificate must be presented on site to enter the event. ⚠️

Machine Learning (ML) is having a huge impact in the automation of tedious and repetitive tasks in several industries. In this workshop, we take the example of the digitalization of paper documents to show how standard ML techniques can have a big impact in this context.

Many institutions deal every day with a large number of paper documents (invoices, vouchers, …). These documents are often treated by employees with a high business knowledge and entered manually in a database or in another type of data storage. This is clearly an inefficient way to use resources. Therefore, the automation of this task is a priority to many institutions.

The goal of this workshop is to show you how ML can be used in the automation of such a process. You will see how to implement algorithms to:

  • Classify scanned documents (using a CNN model implemented with Keras)
  • Detect the position of a few fields within the document (using a bounding box regression, implemented in Keras)
  • Extract the information from these fields (using Tesseract)
  • Use the human feedback to improve the system performance (human-in-the-loop  or HITL)

The notebooks and slides are available here.

Workshop / Outcome

This workshop will give the participants a simple and intuitive introduction to image recognition, object detection and a few key aspects of a HITL feedback system.

Participants will experience first-hand the potential and limitations of these standard computer vision techniques, when applied to the digitalization of scanned documents. 

Participants will build, modify and play with these algorithms and explore the role of ML in the automation a tedious and repetitive task with simple algorithms.

Workshop / Difficulty

Beginner level

Workshop / Prerequisites

  • Basic understanding of supervised learning
  • Beginner level in a programming language (Python, VBA, C++, R, or Matlab for example)
  • Own laptop with modern browser

Track / Co-organizers

Valerio Rossetti

Co-Founder, SamurAI

Giulio Cornelio Grossi

Senior Quantitative Portfolio Manager, One Swiss Bank

AMLD EPFL 2021 / Workshops

Towards ethical AI – practical tools for responsible data scientists

With Johan Rochel & Lea Strohm

10:00-11:30 November 10Online

How to make your NLP system multilingual

With Adam Bittlingmayer & Nerses Nersesyan

10:00-12:00 March 02Online

Deep Learning-Driven Text Summarization & Explainability with Reuters News Data

With Nadja Herger, Nina Hristozova & Andreea Iuga

15:00-17:30 March 02Online

AMLD / Global partners