Evolution of Representations in the Transformer

15:30-16:00, January 28 @ 5ABC

Talk/ Overview

Recently, analysis of deep neural networks has been an active topic of research. While previous work mostly used so-called 'probing tasks' and has made some interesting observations, an explanation of the process behind the observed behavior has been lacking. 

I attempt to explain more generally why such behavior is observed by characterizing how the learning objective determines the information flow in the model. In particular, I consider how the representations of individual tokens in the Transformer evolve between layers under different learning objectives:  machine translation (MT), language modeling (LM), and masked language modeling (MLM, aka BERT). I look at this task from the information bottleneck perspective on learning in neural networks.

I will show, that:
- LMs gradually forget past when forming predictions about future;
- for MLMs, the evolution proceeds in two stages of context encoding and token reconstruction;
- MT representations get refined with context, but less processing is happening.

Talk/ Speakers

Lena Voita

Research Scientist, Yandex Research & PhD student, Uni Amsterdam & Uni Edinburgh

Talk/ Slides

Download the slides for this talk.Download ( PDF, 192254.09 MB)

AMLD / Global partners