Accurate prediction of protein structures and interactions using a 3-track neural network

DeepMind presented remarkably accurate protein structure predictions at the CASP14 conference. We explored network architectures incorporating related ideas and obtained the best performance with a 3-track network in which information at the 1D sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The 3-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables rapid solution of challenging X-ray crystallography and cryo-EM structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate models of protein-protein complexes from sequence information alone, short circuiting traditional approaches which require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research. One-Sentence Summary Accurate protein structure modeling enables rapid solution of structure determination problems and provides insights into biological function.

[1]  N. Grishin,et al.  The DBSAV database: predicting deleteriousness of single amino acid variations in the human proteome. , 2021, Journal of molecular biology.

[2]  Jamie B. Spangler,et al.  Structural basis for IL-12 and IL-23 receptor sharing reveals a gateway for shaping actions on T versus NK cells , 2021, Cell.

[3]  A. Futerman,et al.  Ceramide synthases: Reflections on the impact of Dr. Lina M. Obeid. , 2021, Cellular signalling.

[4]  György M. Keserü,et al.  GPCRdb in 2021: integrating GPCR sequence, structure and function , 2020, Nucleic Acids Res..

[5]  Alex Warwick Vesztrocy,et al.  OMA orthology in 2021: website overhaul, conserved isoforms, ancestral gene order and more , 2020, Nucleic Acids Res..

[6]  Tie-Yan Liu,et al.  CopulaNet: Learning residue co-evolution directly from multiple sequence alignment for protein structure prediction , 2020, Nature Communications.

[7]  Jens Meiler,et al.  Improving homology modeling from low-sequence identity templates in Rosetta: A case study in GPCRs , 2020, PLoS Comput. Biol..

[8]  Lucy J. Colwell,et al.  Rethinking Attention with Performers , 2020, ICLR.

[9]  Miroslav P. Milev,et al.  The phenotype associated with variants in TANGO2 may be explained by a dual role of the protein in ER‐to‐Golgi transport and at the mitochondria , 2020, Journal of inherited metabolic disease.

[10]  Yu Sun,et al.  Masked Label Prediction: Unified Massage Passing Model for Semi-Supervised Classification , 2020, IJCAI.

[11]  Conrad C. Huang,et al.  UCSF ChimeraX: Structure visualization for researchers, educators, and developers , 2020, Protein science : a publication of the Protein Society.

[12]  David Baker,et al.  De novo protein design by deep network hallucination , 2020, Nature.

[13]  Minkyung Baek,et al.  Improved protein structure refinement guided by deep learning based accuracy estimation , 2020, Nature Communications.

[14]  Jianyi Yang,et al.  Improved protein structure prediction using predicted interresidue orientations , 2019, Proceedings of the National Academy of Sciences.

[15]  Christopher J. Williams,et al.  Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix , 2019, Acta crystallographica. Section D, Structural biology.

[16]  D. Baker,et al.  Protein interaction networks revealed by proteome coevolution , 2019, Science.

[17]  R. Khalil,et al.  A Disintegrin and Metalloproteinase (ADAM) and ADAM with thrombospondin motifs (ADAMTS) family in vascular biology and disease. , 2019, Biochemical pharmacology.

[18]  Milot Mirdita,et al.  HH-suite3 for fast remote homology detection and deep protein annotation , 2019, BMC Bioinformatics.

[19]  Yugyung Lee,et al.  RUPEE: A fast and accurate purely geometric protein structure search , 2018, bioRxiv.

[20]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[21]  J. Söding,et al.  Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold , 2018, bioRxiv.

[22]  Zsuzsanna Dosztányi,et al.  IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding , 2018, Nucleic Acids Res..

[23]  Alessandro Barbato,et al.  Continuous Automated Model EvaluatiOn (CAMEO) complementing the critical assessment of structure prediction in CASP12 , 2018, Proteins.

[24]  Lukas Zimmermann,et al.  A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core. , 2017, Journal of molecular biology.

[25]  Yang Zhang,et al.  I-TASSER-MR: automated molecular replacement for distant-homology proteins using iterative fragment assembly and progressive sequence truncation , 2017, Nucleic Acids Res..

[26]  Bernhard Lohkamp,et al.  Ab initio solution of macromolecular crystal structures without direct methods , 2017, Proceedings of the National Academy of Sciences.

[27]  Maria Jesus Martin,et al.  Uniclust databases of clustered and deeply annotated protein sequences and alignments , 2016, Nucleic Acids Res..

[28]  Bradley P. Coe,et al.  Recurrent Muscle Weakness with Rhabdomyolysis, Metabolic Crises, and Cardiac Arrhythmia Due to Bi-allelic TANGO2 Mutations. , 2016, American journal of human genetics.

[29]  R. Durbin,et al.  Bi-allelic Truncating Mutations in TANGO2 Cause Infancy-Onset Recurrent Metabolic Crises with Encephalocardiomyopathy. , 2016, American journal of human genetics.

[30]  Randy J. Read,et al.  Local Error Estimates Dramatically Improve the Utility of Homology Models for Solving Crystal Structures by Molecular Replacement , 2015, Structure.

[31]  Yuxing Liao,et al.  ECOD: An Evolutionary Classification of Protein Domains , 2014, PLoS Comput. Biol..

[32]  P. Striano,et al.  Impairment of ceramide synthesis causes a novel progressive myoclonus epilepsy , 2014, Annals of neurology.

[33]  D. Baker,et al.  Relaxation of backbone bond geometry improves protein energy landscape modeling , 2014, Protein science : a publication of the Protein Society.

[34]  M. Akiyama,et al.  Whole-exome sequencing identifies ADAM10 mutations as a cause of reticulate acropigmentation of Kitamura, a clinical entity distinct from Dowling-Degos disease. , 2013, Human molecular genetics.

[35]  Marco Biasini,et al.  lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests , 2013, Bioinform..

[36]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[37]  P. Zwart,et al.  Towards automated crystallographic structure refinement with phenix.refine , 2012, Acta crystallographica. Section D, Biological crystallography.

[38]  Randy J. Read,et al.  Improvement of molecular-replacement models with Sculptor , 2011, Acta crystallographica. Section D, Biological crystallography.

[39]  R. Read,et al.  Using SAD data in Phaser , 2011, Acta crystallographica. Section D, Biological crystallography.

[40]  R. Bischoff,et al.  Active metalloproteases of the A Disintegrin and Metalloprotease (ADAM) family: biological function and structure. , 2011, Journal of proteome research.

[41]  Marco Biasini,et al.  Toward the estimation of the absolute quality of individual protein structure models , 2010, Bioinform..

[42]  Sean R. Eddy,et al.  Hidden Markov model speed heuristic and iterative HMM search procedure , 2010, BMC Bioinformatics.

[43]  A. Futerman,et al.  Mammalian ceramide synthases , 2010, IUBMB life.

[44]  Sergey Lyskov,et al.  PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta , 2010, Bioinform..

[45]  K. Elliott,et al.  Potential late-onset Alzheimer's disease-associated mutations in the ADAM10 gene attenuate {alpha}-secretase activity. , 2009, Human molecular genetics.

[46]  Randy J. Read,et al.  Dauter Iterative model building , structure refinement and density modification with the PHENIX AutoBuild wizard , 2007 .

[47]  Randy J. Read,et al.  Phaser crystallographic software , 2007, Journal of applied crystallography.

[48]  Y. Hannun,et al.  Necessary Role for the Lag1p Motif in (Dihydro)ceramide Synthase Activity* , 2006, Journal of Biological Chemistry.

[49]  S. Takeda,et al.  Crystal structures of VAP1 reveal ADAMs' MDC domain architecture and its unique C‐shaped scaffold , 2006, The EMBO journal.

[50]  C. Rabouille,et al.  TANGOing along the protein secretion pathway , 2006, Genome Biology.

[51]  Jun Zou,et al.  Crystal structure of the catalytic domain of human ADAM33. , 2004, Journal of molecular biology.

[52]  C. Ponting,et al.  TRAM, LAG1 and CLN8: members of a novel family of lipid-sensing domains? , 2002, Trends in biochemical sciences.

[53]  Jimin Pei,et al.  AL2CO: calculation of positional conservation in a protein sequence alignment , 2001, Bioinform..

[54]  D R Flower,et al.  The lipocalin protein family: structural and sequence overview. , 2000, Biochimica et biophysica acta.

[55]  T G Wolfsberg,et al.  ADAM, a novel family of membrane proteins containing A Disintegrin And Metalloprotease domain: multipotential functions in cell-cell and cell- matrix interactions , 1995, The Journal of cell biology.

[56]  H. Birkedal‐Hansen,et al.  The cysteine switch: a principle of regulation of metalloproteinase activity with potential applicability to the entire matrix metalloproteinase gene family. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[57]  Liisa Holm,et al.  Using Dali for Protein Structure Comparison. , 2020, Methods in molecular biology.

[58]  Christopher J. Williams,et al.  MolProbity: More and better reference data for improved all‐atom structure validation , 2018, Protein science : a publication of the Protein Society.

[59]  P. Emsley,et al.  Features and development of Coot , 2010, Acta crystallographica. Section D, Biological crystallography.

[60]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[61]  Thomas C. Terwilliger,et al.  Electronic Reprint Biological Crystallography Maximum-likelihood Density Modification , 2022 .

[62]  J. Skolnick,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.