Event

Predicting protein structures: from AlphaFold2 to single sequence prediction

  • Speaker  Dr. Nazim Bouatta, invited by Professor Massimiliano Esposito

  • Location

    Campus Limpertsberg, Bâtiment des Sciences, Room BS 3.04

    LU

  • Topic(s)
    Physics & Materials Science

Physics meets Biology Hybrid Colloquium

AlphaFold2, DeepMind’s machine-learning algorithm, represents a stunning advance on one of biology’s grand challenges: predicting the 3D structure of a protein from the knowledge of its sequence of amino acids. AlphaFold2’s performance demonstrates the remarkable power of deep learning in molecular problems when co-evolutionary information is available in terms of multiple sequence alignments (MSAs). However, despite the outstanding performance of AlphaFold2 (AF2), many challenges remain, including (i) prediction of orphan and rapidly evolving proteins for which an MSA cannot be generated and (ii) rapid exploration of humanely designed proteins.

In this talk, I will discuss AlphaFold2 key features, including its use of (i) Transformers to capture long-range dependencies and (ii) symmetry principles, in terms of a 3D rotationally and translationally equivariant module (for a high-level perspective of AlphaFold2, see our review (Bouatta et al., 2021)). I will then describe another development achieved by our deep learning system (RGN2), capable of predicting the 3D structure of a protein from single sequences without the use of MSAs, relevant for mutagenesis and protein design (Chowdhury et al., 2021). One key feature of RGN2 is using a protein language model inspired by models developed initially for natural language processing. As a result, we outperform all major methods on proteins without homologs, including AlphaFold2.

Bouatta, N., Sorger, P. & AlQuraishi, M. (2021). Acta Crystallogr. Sect. D Struct. Biol. 77, 982–991. Ratul Chowdhury,*, Nazim Bouatta,*, Surojit Biswas,*, Charlotte Rochereau, George M. Church, Peter K. Sorger, and M. AlQuraishi. (2021). Single-sequence protein structure prediction using language models from deep learning. bioRxiv: https://doi.org/10.1101/2021.08.02.454840

About the speaker:

Nazim Bouatta is a Senior Research Fellow in the Department of Systems Biology and the Laboratory of Systems Pharmacology at Harvard, and an affiliate member of the Department of Systems Biology at Columbia. He received his Ph.D. in theoretical high-energy physics. After working on the Foundations of Physics at the University of Cambridge, he transitioned to biology after joining Harvard.  He is concerned with the relations between deep learning, physics, and biology, with a special focus on building models of biomolecules and their interactions. He recently co-led an effort of building a trainable version of AlphaFold2, implemented in PyTorch, called OpenFold.