Protein language models are deep learning models based on natural language processing methods, especially transformer architectures. They are trained on large ensembles of protein sequences, and capture long-range dependencies within a protein sequence. They are able to predict protein three-dimensional structure from one single sequence in an unsupervised way. The great success of supervised protein structure prediction by AlphaFold is partly based on these approaches. It is therefore of strong interest to assess their generative ability. We will show a comparison of their generative properties to those of bmDCA, a state-of-the-art Potts model that is known to be generative. Then, we will discuss how these models learn phylogeny in addition to structural constraints.
Download the slides for this talk.Download ( PDF, 9226.56 MB)