Improved Contact Predictions Using the Recognition of Protein Like Contact Patterns

Given sufficient large protein families, and using a global statistical inference approach, it is possible to obtain sufficient accuracy in protein residue contact predictions to predict the structure of many proteins. However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent. Here, we present PconsC2, a novel method that uses a deep learning approach to identify protein-like contact patterns to improve contact predictions. A substantial enhancement can be seen for all contacts independently on the number of aligned sequences, residue separation or secondary structure type, but is largest for β-sheet containing proteins. In addition to being superior to earlier methods based on statistical inferences, in comparison to state of the art methods using machine learning, PconsC2 is superior for families with more than 100 effective sequence homologs. The improved contact prediction enables improved structure prediction.

[1]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[2]  C A Floudas,et al.  ASTRO-FOLD 2.0: an Enhanced Framework for Protein Structure Prediction. , 2012, AIChE journal. American Institute of Chemical Engineers.

[3]  Erik van Nimwegen,et al.  Disentangling Direct from Indirect Co-Evolution of Residues in Protein Alignments , 2010, PLoS Comput. Biol..

[4]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[5]  G. Stormo,et al.  Correlated mutations in models of protein sequences: phylogenetic and structural effects , 1999 .

[6]  Marcin J. Skwark,et al.  PconsC: combination of direct information methods and alignments improves contact prediction , 2013, Bioinform..

[7]  Timothy Nugent,et al.  Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis , 2012, Proceedings of the National Academy of Sciences.

[8]  Krzysztof Fidelis,et al.  CASP9 results compared to those of previous casp experiments , 2011, Proteins.

[9]  D. Eisenberg,et al.  An evolutionary approach to folding small alpha-helical proteins that uses sequence information and an empirical guiding fitness function. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[10]  M Vendruscolo,et al.  Recovery of protein structure from contact maps. , 1997, Folding & design.

[11]  Sean R. Eddy,et al.  Hidden Markov model speed heuristic and iterative HMM search procedure , 2010, BMC Bioinformatics.

[12]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[13]  Pierre Baldi,et al.  Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners , 2002, ISMB.

[14]  T. Petersen,et al.  A generic method for assignment of reliability scores applied to solvent accessibility predictions , 2009, BMC Structural Biology.

[15]  Yoshua Bengio,et al.  Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[16]  I. Bahar,et al.  Sequence Evolution Correlates with Structural Dynamics , 2012, Molecular biology and evolution.

[17]  Krzysztof Fidelis,et al.  CASP10 results compared to those of previous CASP experiments , 2014, Proteins.

[18]  Pierre Baldi,et al.  Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction , 2012, NIPS.

[19]  Zhiyong Wang,et al.  Predicting protein contact map using evolutionary and physical constraints by integer programming , 2013, Bioinform..

[20]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[21]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[22]  M. Tress,et al.  Predicted residue–residue contacts can help the scoring of 3D models , 2010, Proteins.

[23]  Marcin J. Skwark,et al.  PconsFold: improved contact predictions improve protein models , 2014, Bioinform..

[24]  Ivet Bahar,et al.  ProDy: Protein Dynamics Inferred from Theory and Experiments , 2011, Bioinform..

[25]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[26]  Jianlin Cheng,et al.  Predicting protein residue-residue contacts using deep networks and boosting , 2012, Bioinform..

[27]  P Fariselli,et al.  Prediction of contact maps with neural networks and correlated mutations. , 2001, Protein engineering.

[28]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[29]  Stefano Piana,et al.  Assessing the accuracy of physical models used in protein-folding simulations: quantitative evidence from long molecular dynamics simulations. , 2014, Current opinion in structural biology.

[30]  R. Dror,et al.  How Fast-Folding Proteins Fold , 2011, Science.

[31]  Hongjun Bai,et al.  Assessment of template‐free modeling in CASP10 and ROLL , 2014, Proteins.

[32]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[33]  C. Sander,et al.  Correlated Mutations and Residue Contacts , 1994 .

[34]  K. Henrick,et al.  Inference of macromolecular assemblies from crystalline state. , 2007, Journal of molecular biology.

[35]  Christopher Jarzynski,et al.  Using Sequence Alignments to Predict Protein Structure and Stability With High Accuracy , 2012, 1207.2484.

[36]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[37]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[38]  Jianlin Cheng,et al.  A study and benchmark of DNcon: a method for protein residue-residue contact prediction using deep networks , 2013, BMC Bioinformatics.

[39]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[40]  Osvaldo Graña,et al.  Assessment of domain boundary predictions and the prediction of intramolecular contacts in CASP8 , 2009, Proteins.

[41]  Richard Bonneau,et al.  Ab initio protein structure prediction of CASP III targets using ROSETTA , 1999, Proteins.

[42]  David C. Jones Predicting novel protein folds by using FRAGFOLD , 2001, Proteins.

[43]  A S Lapedes,et al.  Superadditive correlation. , 1999, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[44]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[45]  Michael Eickenberg,et al.  Machine learning for neuroimaging with scikit-learn , 2014, Front. Neuroinform..

[46]  Pierre Baldi,et al.  Deep architectures for protein contact map prediction , 2012, Bioinform..

[47]  E. Aurell,et al.  Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[48]  C. Floudas,et al.  ASTRO-FOLD: a combinatorial and global optimization framework for Ab initio prediction of three-dimensional structures of proteins from the amino acid sequence. , 2003, Biophysical journal.

[49]  M. Levitt,et al.  Computer simulation of protein folding , 1975, Nature.

[50]  Geoffrey E. Hinton Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[51]  Thomas A. Hopf,et al.  Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing , 2012, Cell.

[52]  M. Gerstein,et al.  Protein family and fold occurrence in genomes: power-law behaviour and evolutionary model. , 2001, Journal of molecular biology.

[53]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[54]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.