Opportunities and limitations in applying coevolution-derived contacts to protein structure prediction

Abstract The prospect of identifying contacts in protein structures purely from aligned protein sequences has lured researchers for a long time, but progress has been modest until recently. Here, we reviewed the most successful methods for identifying structural contacts from sequence and how these methods differ and made an initial assessment of the overlap of predicted contacts by alternative approaches. We then discussed the limitations of these methods and possibilities for future development and highlighted the recent applications of contacts in tertiary structure prediction, identifying the residues at the interfaces of protein-protein interactions, and the use of these methods in disentangling alternative conformational states. Finally, we identified the current challenges in the field of contact prediction, concentrating on the limitations imposed by available data, dependencies on the sequence alignments, and possible future developments.

[1]  L. C. Martin,et al.  Using information theory to search for co-evolving residues in proteins , 2005, Bioinform..

[2]  Michael I Sadowski Prediction of protein domain boundaries from inverse covariances , 2013, Proteins.

[3]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[4]  E. Neher How frequent are correlated changes in families of protein sequences? , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[5]  F. Cohen,et al.  Co-evolution of proteins with their interaction partners. , 2000, Journal of molecular biology.

[6]  K. Hatrick,et al.  Compensating changes in protein multiple sequence alignments. , 1994, Protein engineering.

[7]  F. Morcos,et al.  Genomics-aided structure prediction , 2012, Proceedings of the National Academy of Sciences.

[8]  Magnus Ekeberg,et al.  Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences , 2014, J. Comput. Phys..

[9]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[10]  Thomas A. Hopf,et al.  Sequence co-evolution gives 3D contacts and structures of protein complexes , 2014, eLife.

[11]  Ann M Stock,et al.  Two-component signal transduction. , 2000, Annual review of biochemistry.

[12]  S. Eddy,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[13]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[14]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[15]  A. Brunger Version 1.2 of the Crystallography and NMR system , 2007, Nature Protocols.

[16]  Philip A. Romero,et al.  Exploring protein fitness landscapes by directed evolution , 2009, Nature Reviews Molecular Cell Biology.

[17]  R J Read,et al.  Crystallography & NMR system: A new software suite for macromolecular structure determination. , 1998, Acta crystallographica. Section D, Biological crystallography.

[18]  Faruck Morcos,et al.  From structure to function: the convergence of structure based models and co-evolutionary information. , 2014, Physical chemistry chemical physics : PCCP.

[19]  Greg W. Clark,et al.  Multidimensional mutual information methods for the analysis of covariation in multiple sequence alignments , 2013, BMC Bioinformatics.

[20]  S. Benner,et al.  Patterns of divergence in homologous proteins as indicators of secondary and tertiary structure: a prediction of the structure of the catalytic domain of protein kinases. , 1991, Advances in enzyme regulation.

[21]  Marcin J. Skwark,et al.  PconsFold: improved contact predictions improve protein models , 2014, Bioinform..

[22]  David E. Kim,et al.  One contact for every twelve residues allows robust and accurate topology‐level protein structure modeling , 2014, Proteins.

[23]  C. Sander,et al.  Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? , 1994, Protein engineering.

[24]  M. Karplus,et al.  Three key residues form a critical contact network in a protein folding transition state , 2001, Nature.

[25]  José N. Onuchic,et al.  Toward rationally redesigning bacterial two-component signaling systems using coevolutionary information , 2014, Proceedings of the National Academy of Sciences.

[26]  Judith A. Kantor,et al.  Beta thalassemia: Mutations which affect processing of the β-globin mRNA precursor , 1980, Cell.

[27]  A S Lapedes,et al.  Superadditive correlation. , 1999, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[28]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[29]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[30]  A. Lesk,et al.  Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. , 1987, Journal of molecular biology.

[31]  Malgorzata Kotulska,et al.  Automated Procedure for Contact-Map-Based Protein Structure Reconstruction , 2014, The Journal of Membrane Biology.

[32]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[33]  Alfonso Valencia,et al.  Emerging methods in protein co-evolution , 2013 .

[34]  Markus Gruber,et al.  CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations , 2014, Bioinform..

[35]  J. Janin,et al.  Protein–protein interaction and quaternary structure , 2008, Quarterly Reviews of Biophysics.

[36]  Debora S. Marks,et al.  Sequence co-evolution gives 3D contacts and structures of protein complexes , 2014, bioRxiv.

[37]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Chin-Hsien Tai,et al.  Assessment of CASP10 contact‐assisted predictions , 2014, Proteins.

[39]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[40]  A. Emili,et al.  Interaction network containing conserved and essential protein complexes in Escherichia coli , 2005, Nature.

[41]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[42]  John Maynard Smith,et al.  Natural Selection and the Concept of a Protein Space , 1970, Nature.

[43]  Ming-Jing Hwang,et al.  On the use of distance constraints in protein–protein docking computations , 2012, Proteins.

[44]  David T. Jones,et al.  De Novo Structure Prediction of Globular Proteins Aided by Sequence Variation-Derived Contacts , 2014, PloS one.

[45]  David C. Jones Predicting novel protein folds by using FRAGFOLD , 2001, Proteins.

[46]  A. Valencia,et al.  Correlated mutations contain information about protein-protein interaction. , 1997, Journal of molecular biology.

[47]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[48]  Burkhard Rost,et al.  FreeContact: fast and free software for protein contact prediction from residue co-evolution , 2014, BMC Bioinformatics.

[49]  D. Baker,et al.  Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information , 2014, eLife.

[50]  Jouhyun Jeon,et al.  Molecular evolution of protein conformational changes revealed by a network of evolutionarily coupled residues. , 2011, Molecular biology and evolution.

[51]  Marcin J. Skwark,et al.  Improving Contact Prediction along Three Dimensions , 2014, PLoS Comput. Biol..

[52]  Thomas A. Hopf,et al.  Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing , 2012, Cell.

[53]  Michael Lappe,et al.  Optimal contact definition for reconstruction of Contact Maps , 2010, BMC Bioinformatics.

[54]  Art Poon,et al.  The Rate of Compensatory Mutation in the DNA Bacteriophage φX174 , 2005, Genetics.

[55]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[56]  Timothy Nugent,et al.  Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis , 2012, Proceedings of the National Academy of Sciences.

[57]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[58]  David T. Jones,et al.  Protein topology from predicted residue contacts , 2012, Protein science : a publication of the Protein Society.

[59]  E. Aurell,et al.  Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[60]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[61]  Carlo Baldassi,et al.  Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners , 2014, PloS one.

[62]  A. Lapedes,et al.  Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Terence Hwa,et al.  Coevolutionary signals across protein lineages help capture multiple protein conformations , 2013, Proceedings of the National Academy of Sciences.

[64]  M Vendruscolo,et al.  Recovery of protein structure from contact maps. , 1997, Folding & design.

[65]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[66]  Gaurav Tyagi,et al.  Functionally compensating coevolving positions are neither homoplasic nor conserved in clades. , 2010, Molecular biology and evolution.

[67]  William R Taylor,et al.  Prediction of contacts from correlated sequence substitutions. , 2013, Current opinion in structural biology.

[68]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[69]  D Altschuh,et al.  Correlation of co-ordinated amino acid changes at the two-domain interface of cysteine proteases with protein stability. , 1992, Journal of molecular biology.

[70]  G. Stormo,et al.  Correlated mutations in models of protein sequences: phylogenetic and structural effects , 1999 .

[71]  W. Pearson,et al.  Homologous over-extension: a challenge for iterative similarity searches , 2010, Nucleic acids research.

[72]  Marcin J. Skwark,et al.  PconsC: combination of direct information methods and alignments improves contact prediction , 2013, Bioinform..

[73]  Shuai Cheng Li,et al.  Prediction of residue-residue contacts from protein families using similarity kernels and least squares regularization , 2013, 1311.1301.

[74]  Simon C Lovell,et al.  The effect of sequence evolution on protein structural divergence. , 2009, Molecular biology and evolution.

[75]  M Michael Gromiha,et al.  Inter-residue interactions in protein folding and stability. , 2004, Progress in biophysics and molecular biology.

[76]  Zhiyong Wang,et al.  MRFalign: Protein Homology Detection through Alignment of Markov Random Fields , 2014, PLoS Comput. Biol..

[77]  W. Taylor,et al.  Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution. , 1997, Protein engineering.

[78]  M. Tress,et al.  Predicted residue–residue contacts can help the scoring of 3D models , 2010, Proteins.

[79]  Piero Fariselli,et al.  BCov: a method for predicting β-sheet topology using sparse inverse covariance estimation and integer programming , 2013, Bioinform..

[80]  M. Milán,et al.  bantam miRNA Promotes Systemic Growth by Connecting Insulin Signaling and Ecdysone Production , 2013, Current Biology.

[81]  Daniel Y. Little,et al.  Identification of Coevolving Residues and Coevolution Potentials Emphasizing Structure, Bond Formation and Catalytic Coordination in Protein Evolution , 2009, PloS one.