Fraud detection with unsupervised machine learning

09:00-12:30, January 25

Workshop / Overview

In many fraud- (or general outlier-) detection situations, labelled data is not available. We therefore need to resort to unsupervised methods to identify points that are somehow untypical.

In this workshop, a short introduction will be given that discusses the main outlier detection methods (from the classic DBSCAN to modern algorithms such as Isolation Forest and autoencoders) and appropriate metrics for highly imbalanced datasets (with an extra focus on "cost-sensitive" measures)

Then, participants will be given two unlabelled datasets to make predictions on. Scores will be compared on a Kaggle-style leader board, with the emphasis on comparing techniques. 

Workshop / Outcome

After the workshop, participants will:

  • Know the main algorithms for unsupervised outlier detection, and their pros and cons
  • Understand what scoring metrics may be used for highly imbalanced classification, and how these relate to business costs
  • Know how to set up an unsupervised classification pipeline

Workshop / Difficulty

Intermediate level

Workshop / Prerequisites

  • Intermediate Python skills
  • Basic understanding of Machine Learning concepts
  • Own laptop with Anaconda and Scikit-learn (0.21 or greater)

Track / Co-organizers

Ernst Oldenhof

Data Scientist, Julius Bär

AMLD EPFL 2020 / Workshops

A Conceptual Introduction to Reinforcement Learning

With Bram Vandendriessche, Kevin Smeyers & Katrien Van Meulder

09:00-12:30 January 25

Applied Machine Learning with R

With Dirk Wulff

09:00-17:00 January 25

Augmenting the Web browsing experience using machine learning

With Tudor Avram, Oleksandr Paraska, Vasily Kuznetsov & Levan Tsinadze

09:00-12:30 January 25

AMLD / Global partners