Accurate prediction of protein structures and interactions using a 3-track neural network

Deep learning takes on protein folding In 1972, Anfinsen won a Nobel prize for demonstrating a connection between a protein's amino acid sequence and its three-dimensional structure. Since 1994, scientists have competed in the biannual Critical Assessment of Structure Prediction (CASP) protein-folding challenge. Deep learning methods took center stage at CASP14, with DeepMind's Alphafold2 achieving remarkable accuracy. Baek et al. explored network architectures based on the DeepMind framework. They used a three-track network to process sequence, distance, and coordinate information simultaneously and achieved accuracies approaching those of DeepMind. The method, RoseTTA fold, can solve challenging x-ray crystallography and cryo–electron microscopy modeling problems and generate accurate models of protein-protein complexes. Science, abj8754, this issue p. 871 Protein structure modeling enables the rapid solution of protein structures and provides insights into function. DeepMind presented notably accurate predictions at the recent 14th Critical Assessment of Structure Prediction (CASP14) conference. We explored network architectures that incorporate related ideas and obtained the best performance with a three-track network in which information at the one-dimensional (1D) sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The three-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables the rapid solution of challenging x-ray crystallography and cryo–electron microscopy structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate protein-protein complex models from sequence information alone, short-circuiting traditional approaches that require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research.

[1]  H. Birkedal‐Hansen,et al.  The cysteine switch: a principle of regulation of metalloproteinase activity with potential applicability to the entire matrix metalloproteinase gene family. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[2]  T G Wolfsberg,et al.  ADAM, a novel family of membrane proteins containing A Disintegrin And Metalloprotease domain: multipotential functions in cell-cell and cell- matrix interactions , 1995, The Journal of cell biology.

[3]  Thomas C. Terwilliger,et al.  Electronic Reprint Biological Crystallography Maximum-likelihood Density Modification , 2022 .

[4]  D R Flower,et al.  The lipocalin protein family: structural and sequence overview. , 2000, Biochimica et biophysica acta.

[5]  Jimin Pei,et al.  AL2CO: calculation of positional conservation in a protein sequence alignment , 2001, Bioinform..

[6]  Jun Zou,et al.  Crystal structure of the catalytic domain of human ADAM33. , 2004, Journal of molecular biology.

[7]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[8]  C. Rabouille,et al.  TANGOing along the protein secretion pathway , 2006, Genome Biology.

[9]  S. Takeda,et al.  Crystal structures of VAP1 reveal ADAMs' MDC domain architecture and its unique C‐shaped scaffold , 2006, The EMBO journal.

[10]  Y. Hannun,et al.  Necessary Role for the Lag1p Motif in (Dihydro)ceramide Synthase Activity* , 2006, Journal of Biological Chemistry.

[11]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[12]  Randy J. Read,et al.  Phaser crystallographic software , 2007, Journal of applied crystallography.

[13]  Randy J. Read,et al.  Dauter Iterative model building , structure refinement and density modification with the PHENIX AutoBuild wizard , 2007 .

[14]  P. Emsley,et al.  Features and development of Coot , 2010, Acta crystallographica. Section D, Biological crystallography.

[15]  Mammalian ceramide synthases , 2010, IUBMB life.

[16]  Sergey Lyskov,et al.  PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta , 2010, Bioinform..

[17]  Sean R. Eddy,et al.  Hidden Markov model speed heuristic and iterative HMM search procedure , 2010, BMC Bioinformatics.

[18]  Randy J. Read,et al.  Using SAD data in Phaser , 2011, Acta crystallographica. Section D, Biological crystallography.

[19]  Marco Biasini,et al.  Toward the estimation of the absolute quality of individual protein structure models , 2010, Bioinform..

[20]  R. Bischoff,et al.  Active metalloproteases of the A Disintegrin and Metalloprotease (ADAM) family: biological function and structure. , 2011, Journal of proteome research.

[21]  Randy J. Read,et al.  Improvement of molecular-replacement models with Sculptor , 2011, Acta crystallographica. Section D, Biological crystallography.

[22]  P. Zwart,et al.  Towards automated crystallographic structure refinement with phenix.refine , 2012, Acta crystallographica. Section D, Biological crystallography.

[23]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[24]  Marco Biasini,et al.  lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests , 2013, Bioinform..

[25]  Dong Xu,et al.  Toward optimal fragment generations for ab initio protein structure assembly , 2013, Proteins.

[26]  D. Baker,et al.  Relaxation of backbone bond geometry improves protein energy landscape modeling , 2014, Protein science : a publication of the Protein Society.

[27]  Yuxing Liao,et al.  ECOD: An Evolutionary Classification of Protein Domains , 2014, PLoS Comput. Biol..

[28]  P. Striano,et al.  Impairment of ceramide synthesis causes a novel progressive myoclonus epilepsy , 2014, Annals of neurology.

[29]  R. Maya CRITICAL ASSESSMENT OF TECHNIQUES FOR PROTEIN STRUCTURE PREDICTION , 2014 .

[30]  Yang Zhang,et al.  I-TASSER server: new development for protein structure and function predictions , 2015, Nucleic Acids Res..

[31]  Randy J. Read,et al.  Local Error Estimates Dramatically Improve the Utility of Homology Models for Solving Crystal Structures by Molecular Replacement , 2015, Structure.

[32]  A. Vagin,et al.  MoRDa, an automatic molecular replacement pipeline , 2015 .

[33]  R. Durbin,et al.  Bi-allelic Truncating Mutations in TANGO2 Cause Infancy-Onset Recurrent Metabolic Crises with Encephalocardiomyopathy. , 2016, American Journal of Human Genetics.

[34]  Cathy H. Wu,et al.  UniProt: the universal protein knowledgebase , 2016, Nucleic Acids Research.

[35]  Mahshid S. Azamian,et al.  Recurrent Muscle Weakness with Rhabdomyolysis, Metabolic Crises, and Cardiac Arrhythmia Due to Bi-allelic TANGO2 Mutations. , 2016, American journal of human genetics.

[36]  Yang Zhang,et al.  I-TASSER-MR: automated molecular replacement for distant-homology proteins using iterative fragment assembly and progressive sequence truncation , 2017, Nucleic Acids Res..

[37]  Bernhard Lohkamp,et al.  Ab initio solution of macromolecular crystal structures without direct methods , 2017, Proceedings of the National Academy of Sciences.

[38]  Torsten Schwede,et al.  The SWISS-MODEL Repository—new features and functionality , 2016, Nucleic Acids Res..

[39]  Maria Jesus Martin,et al.  Uniclust databases of clustered and deeply annotated protein sequences and alignments , 2016, Nucleic Acids Res..

[40]  Alessandro Barbato,et al.  Continuous Automated Model EvaluatiOn (CAMEO) complementing the critical assessment of structure prediction in CASP12 , 2018, Proteins.

[41]  Zsuzsanna Dosztányi,et al.  IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding , 2018, Nucleic Acids Res..

[42]  Lukas Zimmermann,et al.  A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core. , 2017, Journal of molecular biology.

[43]  Torsten Schwede,et al.  Assessment of protein assembly prediction in CASP12 , 2018, Proteins.

[44]  Torsten Schwede,et al.  SWISS-MODEL: homology modelling of protein structures and complexes , 2018, Nucleic Acids Res..

[45]  Christopher J. Williams,et al.  MolProbity: More and better reference data for improved all‐atom structure validation , 2018, Protein science : a publication of the Protein Society.

[46]  Yugyung Lee,et al.  RUPEE: A fast and accurate purely geometric protein structure search , 2018, bioRxiv.

[47]  Liam J McGuffin,et al.  IntFOLD: an integrated web resource for high performance protein structure and function prediction , 2019, Nucleic Acids Res..

[48]  Yaoqi Zhou,et al.  SPOT-Disorder2: Improved Protein Intrinsic Disorder Prediction by Ensembled Deep Learning , 2019, Genom. Proteom. Bioinform..

[49]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[50]  Johannes Söding,et al.  Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold , 2018, Nature Methods.

[51]  Johannes Söding,et al.  HH-suite3 for fast remote homology detection and deep protein annotation , 2019, BMC Bioinformatics.

[52]  Alessandro Barbato,et al.  Introducing “best single template” models as reference baseline for the Continuous Automated Model Evaluation (CAMEO) , 2019, Proteins.

[53]  D. Baker,et al.  Protein interaction networks revealed by proteome coevolution , 2019, Science.

[54]  R. Khalil,et al.  A Disintegrin and Metalloproteinase (ADAM) and ADAM with thrombospondin motifs (ADAMTS) family in vascular biology and disease. , 2019, Biochemical pharmacology.

[55]  Jianyi Yang,et al.  Improved protein structure prediction using predicted interresidue orientations , 2020, Proceedings of the National Academy of Sciences.

[56]  David Baker,et al.  De novo protein design by deep network hallucination , 2020, Nature.

[57]  Demis Hassabis,et al.  Improved protein structure prediction using potentials from deep learning , 2020, Nature.

[58]  Johannes Söding,et al.  Protein Sequence Analysis Using the MPI Bioinformatics Toolkit , 2020, Current protocols in bioinformatics.

[59]  Conrad C. Huang,et al.  UCSF ChimeraX: Structure visualization for researchers, educators, and developers , 2020, Protein science : a publication of the Protein Society.

[60]  Liisa Holm,et al.  Using Dali for Protein Structure Comparison. , 2020, Methods in molecular biology.

[61]  Miroslav P. Milev,et al.  The phenotype associated with variants in TANGO2 may be explained by a dual role of the protein in ER‐to‐Golgi transport and at the mitochondria , 2020, Journal of inherited metabolic disease.

[62]  Jens Meiler,et al.  Improving homology modeling from low-sequence identity templates in Rosetta: A case study in GPCRs , 2020, PLoS Comput. Biol..

[63]  Jamie B. Spangler,et al.  Structural basis for IL-12 and IL-23 receptor sharing reveals a gateway for shaping actions on T versus NK cells , 2021, Cell.

[64]  Alex Warwick Vesztrocy,et al.  OMA orthology in 2021: website overhaul, conserved isoforms, ancestral gene order and more , 2020, Nucleic Acids Res..

[65]  N. Grishin,et al.  The DBSAV database: predicting deleteriousness of single amino acid variations in the human proteome. , 2021, Journal of molecular biology.

[66]  RoseTTAFold: The first release of RoseTTAFold , 2021 .

[67]  Yu Sun,et al.  Masked Label Prediction: Unified Massage Passing Model for Semi-Supervised Classification , 2020, IJCAI.

[68]  CopulaNet: Learning residue co-evolution directly from multiple sequence alignment for protein structure prediction , 2021, Nature communications.

[69]  A. Futerman,et al.  Ceramide synthases: Reflections on the impact of Dr. Lina M. Obeid. , 2021, Cellular signalling.

[70]  György M. Keserü,et al.  GPCRdb in 2021: integrating GPCR sequence, structure and function , 2020, Nucleic Acids Res..

[71]  Improved protein structure refinement guided by deep learning based accuracy estimation , 2021, Nature communications.

[72]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[73]  John F. Canny,et al.  MSA Transformer , 2021, bioRxiv.

[74]  Jinbo Xu,et al.  Improved protein structure prediction by deep learning irrespective of co-evolution information , 2021, Nat. Mach. Intell..