Applying and improving AlphaFold at CASP14

We describe the operation and improvement of AlphaFold, the system that was entered by the team AlphaFold2 to the “human” category in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The AlphaFold system entered in CASP14 is entirely different to the one entered in CASP13. It used a novel end‐to‐end deep neural network trained to produce protein structures from amino acid sequence, multiple sequence alignments, and homologous proteins. In the assessors' ranking by summed z scores (>2.0), AlphaFold scored 244.0 compared to 90.8 by the next best group. The predictions made by AlphaFold had a median domain GDT_TS of 92.4; this is the first time that this level of average accuracy has been achieved during CASP, especially on the more difficult Free Modeling targets, and represents a significant improvement in the state of the art in protein structure prediction. We reported how AlphaFold was run as a human team during CASP14 and improved such that it now achieves an equivalent level of performance without intervention, opening the door to highly accurate large‐scale structure prediction.

[1]  K. Fidelis,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round XIV , 2021, Proteins.

[2]  Č. Venclovas,et al.  Modeling SARS‐CoV‐2 proteins in the CASP‐commons experiment , 2021, Proteins.

[3]  Gaohua Liu,et al.  Assessment of prediction methods for protein structures determined by NMR in CASP14: Impact of AlphaFold2 , 2021, Proteins.

[4]  K. Kavukcuoglu,et al.  Highly accurate protein structure prediction for the human proteome , 2021, Nature.

[5]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[6]  C. Govaerts,et al.  AlphaFold2 predicts the inward‐facing conformation of the multidrug transporter LmrP , 2021, Proteins.

[7]  L. Aravind,et al.  Novel Immunoglobulin Domain Proteins Provide Insights into Evolution and Pathogenesis of SARS-CoV-2-Related Viruses , 2020, mBio.

[8]  Federico M Giorgi,et al.  Genomic variance of the 2019‐nCoV coronavirus , 2020, Journal of medical virology.

[9]  David T. Jones,et al.  Improved protein structure prediction using potentials from deep learning , 2020, Nature.

[10]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[11]  Pushmeet Kohli,et al.  Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13) , 2019, Proteins.

[12]  Ekaba Bisong,et al.  Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners , 2019 .

[13]  Milot Mirdita,et al.  HH-suite3 for fast remote homology detection and deep protein annotation , 2019, BMC Bioinformatics.

[14]  Klaus H. Maier-Hein,et al.  A Probabilistic U-Net for Segmentation of Ambiguous Images , 2018, NeurIPS.

[15]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[16]  Andreas Prlic,et al.  Web-based molecular graphics for large complexes , 2016, Web3D.

[17]  Peter B. McGarvey,et al.  UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches , 2014, Bioinform..

[18]  Marco Biasini,et al.  lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests , 2013, Bioinform..

[19]  Daniel Müllner,et al.  Modern hierarchical, agglomerative clustering algorithms , 2011, ArXiv.

[20]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[21]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[22]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..