Challenge / Overview

Named Entity Recognition (NER) and Entity Classification (EC) are well known tasks in Natural Language Processing. Detection of Personal Data Entities (PDE) in unstructured text is a specialized form of NER and EC, that is required for applications like data loss prevention, de-identification, and bias detection.

Because of recent advances in Deep Learning, it has become possible to detect PDEs in text with high precision and recall. However, this research requires large amounts of texts with personal information, which is hard to obtain because of privacy reasons. While few domain specific datasets like i2b2 and MIMIC exist, there are restrictions on their usage.

One feasible way to generate datasets for this research, is to impute random unrelated PDEs in already redacted data. This challenge aims to produce such a synthetic dataset that could be made publicly available to the research community.

Challenge / Co-organizers

Balaji Ganesan

Research Software Engineer, IBM Research Lab

Kalapriya Kannan

Senior Research Engineer, IBM Research

AMLD EPFL 2020 / Challenges

D’Avatar - Reincarnation of Personal Data Entities in Unstructured Text Datasets

With Balaji Ganesan & Kalapriya Kannan

October 30-December 31, 2019

D’Avatar - Reincarnation of Personal Data Entities in Unstructured Text Datasets

With Balaji Ganesan & Kalapriya Kannan

October 30-December 31, 2019

D’Avatar - Reincarnation of Personal Data Entities in Unstructured Text Datasets

With Balaji Ganesan & Kalapriya Kannan

October 30-December 31, 2019

AMLD / Global partners