Because of recent advances in Deep Learning, it has become possible to detect PDEs in text with high precision and recall. However, this research requires large amounts of texts with personal information, which is hard to obtain because of privacy reasons. While few domain specific datasets like i2b2 and MIMIC exist, there are restrictions on their usage.
One feasible way to generate datasets for this research, is to impute random unrelated PDEs in already redacted data. This challenge aims to produce such a synthetic dataset that could be made publicly available to the research community.