Evolution of Representations in the Transformer

Talk / Overview

Recently, analysis of deep neural networks has been an active topic of research. While previous work mostly used so-called 'probing tasks' and has made some interesting observations, an explanation of the process behind the observed behavior has been lacking.

I attempt to explain more generally why such behavior is observed by characterizing how the learning objective determines the information flow in the model. In particular, I consider how the representations of individual tokens in the Transformer evolve between layers under different learning objectives: machine translation (MT), language modeling (LM), and masked language modeling (MLM, aka BERT). I look at this task from the information bottleneck perspective on learning in neural networks.

I will show, that:
- LMs gradually forget past when forming predictions about future;
- for MLMs, the evolution proceeds in two stages of context encoding and token reconstruction;
- MT representations get refined with context, but less processing is happening.

Talk / Speakers

Lena Voita

Research Scientist, Yandex Research & PhD student, Uni Amsterdam & Uni Edinburgh

Talk / Slides

Download the slides for this talk.Download ( PDF, 192254.09 MB)

Talk / Highlights

31:22