Prediction of contacts from correlated sequence substitutions.

Recent work has led to a substantial improvement in the accuracy of predictions of contacts between amino acids using evolutionary information derived from multiple sequence alignments. Where large numbers of diverse sequence relatives are available and can be aligned to the sequence of a protein of unknown structure it is now possible to generate high-resolution models without recourse to the structure of a template. In this review we describe these exciting new techniques and critically assess the state-of-the-art in contact prediction in the light of these. While concentrating on methods, we also discuss applications to protein and RNA structure prediction as well as potential future developments.

[1]  David T. Jones,et al.  Protein topology from predicted residue contacts , 2012, Protein science : a publication of the Protein Society.

[2]  Michael I Sadowski Prediction of protein domain boundaries from inverse covariances , 2013, Proteins.

[3]  W R Taylor,et al.  Coevolving protein residues: maximum likelihood identification and relationship to structure. , 1999, Journal of molecular biology.

[4]  M. Babu,et al.  Deciphering membrane protein structures from protein sequences , 2012, Genome Biology.

[5]  Timothy Nugent,et al.  Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis , 2012, Proceedings of the National Academy of Sciences.

[6]  A. Lesk,et al.  How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. , 1980, Journal of molecular biology.

[7]  Thomas A. Hopf,et al.  Protein structure prediction from sequence variation , 2012, Nature Biotechnology.

[8]  Robert Giegerich,et al.  A comprehensive comparison of comparative RNA structure prediction approaches , 2004, BMC Bioinformatics.

[9]  Erik van Nimwegen,et al.  Disentangling Direct from Indirect Co-Evolution of Residues in Protein Alignments , 2010, PLoS Comput. Biol..

[10]  B. Rost,et al.  Effective use of sequence correlation and conservation in fold recognition. , 1999, Journal of molecular biology.

[11]  H. Hoos,et al.  HotKnots: heuristic prediction of RNA secondary structures including pseudoknots. , 2005, RNA.

[12]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[13]  Bjarne Knudsen,et al.  RNA secondary structure prediction using stochastic context-free grammars and evolutionary history , 1999, Bioinform..

[14]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[15]  William R. Taylor,et al.  Structural Constraints on the Covariance Matrix Derived from Multiple Aligned Protein Sequences , 2011, PloS one.

[16]  D. Thirumalai,et al.  Determination of network of residues that regulate allostery in protein families using sequence analysis , 2006, Protein science : a publication of the Protein Society.

[17]  P. Argos,et al.  Analysis of insertions/deletions in protein structures. , 1992, Journal of molecular biology.

[18]  Ilan Davis,et al.  Identifying and searching for conserved RNA localisation signals. , 2011, Methods in molecular biology.

[19]  R. Ranganathan,et al.  Evolutionarily conserved pathways of energetic connectivity in protein families. , 1999, Science.

[20]  Eric Westhof,et al.  Sequence-based identification of 3D structural modules in RNA with RMDetect , 2011, Nature Methods.

[21]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[22]  W. P. Russ,et al.  Evolutionary information for specifying a protein fold , 2005, Nature.

[23]  Jesse Stombaugh,et al.  Comprehensive survey and geometric classification of base triples in RNA structures , 2011, Nucleic acids research.

[24]  G. Ball,et al.  A Multidisciplinary Approach to RNA Localisation , 2012 .

[25]  Gürol M. Süel,et al.  Evolutionarily conserved networks of residues mediate allosteric communication in proteins , 2003, Nature Structural Biology.

[26]  F. Major,et al.  The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data , 2008, Nature.

[27]  R. Gutell,et al.  Structural Constraints Identified with Covariation Analysis in Ribosomal RNA , 2012, PloS one.

[28]  Feng Ding,et al.  RNA-Puzzles: a CASP-like evaluation of RNA three-dimensional structure prediction. , 2012, RNA.

[29]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[30]  Thomas A. Hopf,et al.  Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing , 2012, Cell.

[31]  Florentin Wörgötter,et al.  Self-Organized Criticality in Developing Neuronal Networks , 2010, PLoS Comput. Biol..

[32]  A. Lesk,et al.  Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. , 1987, Journal of molecular biology.

[33]  F. Morcos,et al.  Genomics-aided structure prediction , 2012, Proceedings of the National Academy of Sciences.

[34]  E. Neher How frequent are correlated changes in families of protein sequences? , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[35]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[36]  A. Valencia,et al.  Correlated mutations contain information about protein-protein interaction. , 1997, Journal of molecular biology.

[37]  G. Stormo,et al.  Correlated mutations in models of protein sequences: phylogenetic and structural effects , 1999 .

[38]  M. Zuker Calculating nucleic acid secondary structure. , 2000, Current opinion in structural biology.

[39]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[40]  J. Kendrew,et al.  Stabilizing Interactions in Globular Proteins , 2008 .

[41]  F. J. Poelwijk,et al.  The spatial architecture of protein function and adaptation , 2012, Nature.

[42]  D Altschuh,et al.  Correlation of co-ordinated amino acid changes at the two-domain interface of cysteine proteases with protein stability. , 1992, Journal of molecular biology.

[43]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[44]  W. Taylor,et al.  Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution. , 1997, Protein engineering.

[45]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[46]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[47]  Christian N. S. Pedersen,et al.  RNA Pseudoknot Prediction in Energy-Based Models , 2000, J. Comput. Biol..

[48]  K. Hatrick,et al.  Compensating changes in protein multiple sequence alignments. , 1994, Protein engineering.

[49]  William R Taylor,et al.  Using scores derived from statistical coupling analysis to distinguish correct and incorrect folds in de‐novo protein structure prediction , 2008, Proteins.

[50]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[51]  William R. Taylor,et al.  Direct correlation analysis improves fold recognition , 2011, Comput. Biol. Chem..

[52]  S Henikoff,et al.  Performance evaluation of amino acid substitution matrices , 1993, Proteins.