End-to-end differentiable learning of protein structure

Predicting protein structure from sequence is a central challenge of biochemistry. Co‐evolution methods show promise, but an explicit sequence‐to‐structure map remains elusive. Advances in deep learning that replace complex, human‐designed pipelines with differentiable models optimized end‐to‐end suggest the potential benefits of similarly reformulating structure prediction. Here we report the first end‐to‐end differentiable model of protein structure. The model couples local and global protein structure via geometric units that optimize global geometry without violating local covalent chemistry. We test our model using two challenging tasks: predicting novel folds without co‐evolutionary data and predicting known folds without structural templates. In the first task the model achieves state‐of‐the‐art accuracy and in the second it comes within 1‐2Å; competing methods using co‐evolution and experimental templates have been refined over many years and it is likely that the differentiable approach has substantial room for further improvement, with applications ranging from drug discovery to protein design.

[1]  Mohammed AlQuraishi,et al.  Parallelized Natural Extension Reference Frame: Parallelized Conversion from Internal to Cartesian Coordinates , 2019, J. Comput. Chem..

[2]  Mohammed AlQuraishi,et al.  ProteinNet: a standardized data set for machine learning of protein structure , 2019, BMC Bioinformatics.

[3]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round XII , 2018, Proteins.

[4]  Yang Zhang,et al.  Template‐based and free modeling of I‐TASSER and QUARK pipelines using predicted contact maps in CASP12 , 2018, Proteins.

[5]  Anna Tramontano,et al.  Evaluation of the template‐based modeling in CASP12 , 2018, Proteins.

[6]  Bonnie Berger,et al.  Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks , 2017, Cell systems.

[7]  Andriy Kryshtafovych,et al.  Assessment of contact predictions in CASP12: Co‐evolution and deep learning coming of age , 2017, Proteins.

[8]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[9]  Matthew Mort,et al.  Biological and functional relevance of CASP predictions , 2017, Proteins.

[10]  Jinbo Xu,et al.  Real-value and confidence prediction of protein backbone dihedral angles through a hybrid method of clustering and deep learning , 2017 .

[11]  Brendan J. Frey,et al.  Generating and designing DNA with deep generative models , 2017, ArXiv.

[12]  Jie Hou,et al.  Deep learning methods for protein torsion angle prediction , 2017, BMC Bioinformatics.

[13]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[14]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[15]  Thomas A. Hopf,et al.  Mutation effects predicted from sequence co-variation , 2017, Nature Biotechnology.

[16]  David E. Kim,et al.  Protein structure determination using metagenome sequence data , 2017, Science.

[17]  David A. Lee,et al.  CATH: an expanded resource to predict protein function through structure and sequence , 2016, Nucleic Acids Res..

[18]  Yoshua Bengio,et al.  Understanding intermediate layers using linear classifier probes , 2016, ICLR.

[19]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[20]  G. Orlando,et al.  Observation selection bias in contact prediction and its implications for structural bioinformatics , 2016, Scientific Reports.

[21]  Alberto Perez,et al.  Blind protein structure prediction using accelerated free-energy simulations , 2016, Science Advances.

[22]  Zhen Li,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016, bioRxiv.

[23]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[24]  Krzysztof Fidelis,et al.  CASP11 statistics and the prediction center evaluation system , 2016, Proteins.

[25]  David E. Kim,et al.  Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta , 2016, Proteins.

[26]  Jason Yosinski,et al.  Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks , 2016, ArXiv.

[27]  J. Söding,et al.  A vocabulary of ancient peptides at the origin of folded proteins , 2015, eLife.

[28]  Debora S. Marks,et al.  Quantification of the effect of mutations using a global probability model of natural sequence variation , 2015, 1510.04612.

[29]  Badri Adhikari,et al.  CONFOLD: residue-residue contact-guided ab initio protein folding , 2015 .

[30]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[31]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[32]  Kuldip K. Paliwal,et al.  Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto‐encoder deep neural network , 2014, J. Comput. Chem..

[33]  Debora S. Marks,et al.  Sequence co-evolution gives 3D contacts and structures of protein complexes , 2014, bioRxiv.

[34]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[35]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[36]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[37]  Yang Zhang,et al.  The I-TASSER Suite: protein structure and function prediction , 2014, Nature Methods.

[38]  A. Valencia,et al.  Emerging methods in protein co-evolution , 2013, Nature Reviews Genetics.

[39]  K. Dill,et al.  The Protein-Folding Problem, 50 Years On , 2012, Science.

[40]  Yang Zhang,et al.  Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field , 2012, Proteins.

[41]  William Stafford Noble,et al.  Protein Torsion Angle Class Prediction by a Hybrid Architecture of Bayesian and Neural Networks , 2012 .

[42]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[43]  Y. Duan,et al.  Trends in template/fragment-free protein structure prediction , 2010, Theoretical chemistry accounts.

[44]  J. Bujnicki,et al.  Protein Structure Prediction: From Recognition of Matches with Known Structures to Recombination of Fragments , 2011 .

[45]  Andrzej Kolinski,et al.  Multiscale Approaches to Protein Modeling , 2011 .

[46]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.

[47]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[48]  Feng Zhao,et al.  Fragment-free approach to protein folding using conditional neural fields , 2010, Bioinform..

[49]  Yang Zhang,et al.  How significant is a protein structure similarity with TM-score = 0.5? , 2010, Bioinform..

[50]  Jürg Hutter,et al.  Ab Initio Molecular Dynamics: Basic Theory and Advanced Methods , 2009 .

[51]  J. Maurice Rojas,et al.  Practical conversion from torsion space to Cartesian space for in silico protein synthesis , 2005, J. Comput. Chem..

[52]  B Contreras-Moreira,et al.  Empirical limits for template‐based protein structure prediction: the CASP5 example , 2005, FEBS letters.

[53]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[54]  C. Ponting,et al.  The natural history of protein domains. , 2002, Annual review of biophysics and biomolecular structure.

[55]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[56]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[57]  K Fidelis,et al.  A large‐scale experiment to assess protein structure prediction methods , 1995, Proteins.

[58]  Hava T. Siegelmann,et al.  On the computational power of neural nets , 1992, COLT '92.

[59]  C. Branden,et al.  Introduction to protein structure , 1991 .

[60]  K. Dill Dominant forces in protein folding. , 1990, Biochemistry.

[61]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[62]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[63]  G. N. Ramachandran,et al.  Stereochemistry of polypeptide chain configurations. , 1963, Journal of molecular biology.