Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks

SUMMARY While genes are defined by sequence, in biological systems a protein’s function is largely determined by its three-dimensional structure. Evolutionary information embedded within multiple sequence alignments provides a rich source of data for inferring structural constraints on macromolecules. Still, many proteins of interest lack sufficient numbers of related sequences, leading to noisy, error-prone residue-residue contact predictions. Here we introduce DeepContact, a convolutional neural network (CNN)-based approach that discovers co-evolutionary motifs and leverages these patterns to enable accurate inference of contact probabilities, particularly when few related sequences are available. DeepContact significantly improves performance over previous methods, including in the CASP12 blind contact prediction task where we achieved top performance with another CNN-based approach. Moreover, our tool converts hard-to-interpret coupling scores into probabilities, moving the field toward a consistent metric to assess contact prediction across diverse proteins. Through substantially improving the precision-recall behavior of contact prediction, DeepContact suggests we are near a paradigm shift in template-free modeling for protein structure prediction.

[1]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[2]  David E. Kim,et al.  Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta , 2016, Proteins.

[3]  Simona Cocco,et al.  Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction , 2015, Nucleic acids research.

[4]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[5]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[6]  Markus Gruber,et al.  CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations , 2014, Bioinform..

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Thomas A. Hopf,et al.  Sequence co-evolution gives 3D contacts and structures of protein complexes , 2014, eLife.

[9]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[10]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11]  D. Phillips,et al.  THE HEN EGG-WHITE LYSOZYME MOLECULE , 1967 .

[12]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[13]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction: Progress and new directions in round XI , 2016, Proteins.

[14]  S. Bryant,et al.  Critical assessment of methods of protein structure prediction (CASP): Round II , 1997, Proteins.

[15]  Mohammed J. Zaki,et al.  Mining Protein Contact Maps , 2002, BIOKDD.

[16]  Burkhard Rost,et al.  FreeContact: fast and free software for protein contact prediction from residue co-evolution , 2014, BMC Bioinformatics.

[17]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[18]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  R J Read,et al.  Crystallography & NMR system: A new software suite for macromolecular structure determination. , 1998, Acta crystallographica. Section D, Biological crystallography.

[21]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[22]  D. Baker,et al.  Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information , 2014, eLife.

[23]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[24]  Martin Weigt,et al.  Large-scale identification of coevolution signals across homo-oligomeric protein interfaces by direct coupling analysis , 2017, Proceedings of the National Academy of Sciences.

[25]  P. Bradley,et al.  Toward High-Resolution de Novo Structure Prediction for Small Proteins , 2005, Science.

[26]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[27]  Steven E. Brenner,et al.  SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures , 2013, Nucleic Acids Res..

[28]  Georgios A. Pavlopoulos,et al.  Protein structure determination using metagenome sequence data , 2017, Science.

[29]  Bonnie Berger,et al.  Compact Integration of Multi-Network Topology for Functional Analysis of Genes. , 2016, Cell systems.

[30]  Aditya G. Parameswaran,et al.  On the Interpretability of Conditional Probability Estimates in the Agnostic Setting , 2015, AISTATS.

[31]  Thomas A. Hopf,et al.  Mutation effects predicted from sequence co-variation , 2017, Nature Biotechnology.

[32]  Jianlin Cheng,et al.  CONFOLD: Residue‐residue contact‐guided ab initio protein folding , 2015, Proteins.

[33]  David E. Kim,et al.  One contact for every twelve residues allows robust and accurate topology‐level protein structure modeling , 2014, Proteins.

[34]  Juergen Haas,et al.  The Protein Model Portal—a comprehensive resource for protein structure and model information , 2013, Database J. Biol. Databases Curation.

[35]  Yang Zhang,et al.  Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field , 2012, Proteins.

[36]  Thomas A. Hopf,et al.  Structured States of Disordered Proteins from Genomic Sequences , 2016, Cell.

[37]  Yang Zhang,et al.  How significant is a protein structure similarity with TM-score = 0.5? , 2010, Bioinform..

[38]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[39]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP) — round x , 2014, Proteins.

[40]  Thomas A. Hopf,et al.  Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing , 2012, Cell.

[41]  et al.,et al.  Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.

[42]  David T. Jones,et al.  MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins , 2014, Bioinform..

[43]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[44]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[45]  Lena Jaeger,et al.  Introduction To Protein Structure , 2016 .

[46]  Zhen Li,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016, bioRxiv.

[47]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[49]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[50]  Adam J. Riesselman,et al.  3D RNA and Functional Interactions from Evolutionary Couplings , 2015, Cell.

[51]  Andriy Kryshtafovych,et al.  Assessment of contact predictions in CASP12: Co‐evolution and deep learning coming of age , 2017, Proteins.

[52]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[53]  E. Aurell,et al.  Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[54]  B. Berger,et al.  Predicting coiled coils by use of pairwise residue correlations. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[55]  Yang Zhang Progress and challenges in protein structure prediction. , 2008, Current opinion in structural biology.

[56]  M. James,et al.  The structures of Thermoplasma volcanium phosphoribosyl pyrophosphate synthetase bound to ribose-5-phosphate and ATP analogs. , 2011, Journal of molecular biology.

[57]  Cristina Marino Buslje,et al.  Correction for phylogeny, small number of observations and data redundancy improves the identification of coevolving amino acid pairs using mutual information , 2009, Bioinform..