Interpretability in deep learning for computational biology

09:00-12:30, January 25

Workshop / Overview

The recent application of deep neural networks to long-standing problems such as the prediction of functional DNA sequences, the inference of protein-protein interactions or the detection of cancer cells in histopathology images has brought a break-through in performance and prediction power.

However, high accuracy often comes at the price of loss of interpretability, i.e. many of these models are built as black-boxes that fail to provide new biological insights.

This tutorial focuses on illustrating some of the recent advancements in the field of Interpretable Artificial Intelligence. We will show how explainable, smaller models can achieve similar levels of performance than cumbersome ones, while shedding light on the underlying biological principles driving model decisions.

We will demonstrate how to build and extract knowledge using interpretable approaches in different domains of computational biology, including analysis of single-cell data, functional sequences of raw DNA, and drug sensitivity prediction models.

The choice of these applications is motivated by the availability of adequately large datasets that can support deep learning (DL) approaches and by their high relevance for personalized medicine. We will exploit both publicly available deep learning models as well as in-house developed models. 

Workshop / Outcome

The tutorial is aimed to strike the right balance between theoretical input and practical exercises. The tutorial has been designed to provide the participants not only with the theory behind DL and interpretability, but also to offer a set of frameworks, tools and real-life examples that they can implement in their own projects.

Specifically, the participants will acquire/refresh basic knowledge on DL models for computational biology by both a brief technical introduction and a showcase of established models for specific practical applications. Next, several techniques to enhance model interpretability will be explored.

In a first case study, a multimodal drug sensitivity prediction model will be introduced and discussed with an emphasis on neural attention mechanism that identify genes and molecular substructures that drove the model decision. Secondly, the problem of predicting transcription factor binding sites from raw DNA sequences will be utilized to demonstrate applications of various interpretability techniques, followed by an evaluation and comparison.

Workshop / Difficulty

Beginner level

Workshop / Prerequisites

  • Basic programming knowledge of Python and shell scripting
  • Own laptop

Track / Co-organizers

An-phi Nguyen

PhD Student, IBM Research Zurich & ETH Zurich

Matteo Manica

Research Staff Member, IBM Research

Maria Rodriguez Martinez

Technical Lead of Systems Biology, IBM Research Zurich

AMLD EPFL 2020 / Workshops

A Conceptual Introduction to Reinforcement Learning

With Bram Vandendriessche, Kevin Smeyers & Katrien Van Meulder

09:00-12:30 January 25

Applied Machine Learning with R

With Dirk Wulff

09:00-17:00 January 25

Augmenting the Web browsing experience using machine learning

With Tudor Avram, Oleksandr Paraska, Vasily Kuznetsov & Levan Tsinadze

09:00-12:30 January 25

AMLD / Global partners