Reliable and robust detection of coevolving protein residues.

Since the cooperative mechanism between interconnected residues plays a critical role in protein functions, the detection of coevolving residues is important for studying various biological functions of proteins. In this work, we developed a new correlated mutation analysis method that shows substantially better prediction accuracy than all other methods. More importantly, the prediction accuracy of our new method is insensitive to the characteristics of the multiple sequence alignments (MSAs) from which the correlated mutation scores are calculated. Thanks to this desirable property, not only it does it show a good performance even for MSAs automatically generated by sequence homology methodologies, which allows us to build a fully automatic easy-to-use server named CMAT, but its performance is also consistently high on the columns of MSAs containing a high fraction of gaps, which greatly extends the applicability of the correlated mutation analysis. The key development of this work is the joint probability estimation that can be greatly improved by utilizing sequence profile as prior knowledge, which is shown to be highly beneficial to the correlated mutation analysis and its applications. From the computational perspective, we made two important findings; the sequence profile can be used to estimate the pseudocounts, and the consistency rule on joint probabilities and marginal probabilities is important for accurately estimating the joint probability. The web server and standalone program are freely available on the web at http://binfolab12.kaist.ac.kr/cmat/.

[1]  S. Henikoff,et al.  Position-based sequence weights. , 1994, Journal of molecular biology.

[2]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[3]  H. Wolfson,et al.  Correlated mutations: Advances and limitations. A study on fusion proteins and on the Cohesin‐Dockerin families , 2006, Proteins.

[4]  A. Valencia,et al.  Correlated mutations contain information about protein-protein interaction. , 1997, Journal of molecular biology.

[5]  Byung-chul Lee,et al.  Analysis of the residue–residue coevolution network and the functionally important residues in proteins , 2008, Proteins.

[6]  W. Atchley,et al.  Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. , 2000, Molecular biology and evolution.

[7]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[8]  Gregory B. Gloor,et al.  Mutual information is critically dependent on prior assumptions: would the correct estimate of mutual information please identify itself? , 2010, Bioinform..

[9]  Haim Ashkenazy,et al.  Reducing phylogenetic bias in correlated mutation analysis. , 2010, Protein engineering, design & selection : PEDS.

[10]  Daniel Y. Little,et al.  Identification of Coevolving Residues and Coevolution Potentials Emphasizing Structure, Bond Formation and Catalytic Coordination in Protein Evolution , 2009, PloS one.

[11]  Anna R. Panchenko,et al.  Structural and Functional Roles of Coevolved Sites in Proteins , 2010, PloS one.

[12]  Thomas A. Hopf,et al.  Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing , 2012, Cell.

[13]  Stephen F. Altschul,et al.  The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions , 2005, Bioinform..

[14]  A. Biegert,et al.  Sequence context-specific profiles for homology searching , 2009, Proceedings of the National Academy of Sciences.

[15]  Jing Zhang,et al.  Detecting and understanding combinatorial mutation patterns responsible for HIV drug resistance , 2010, Proceedings of the National Academy of Sciences.

[16]  Anna R Panchenko,et al.  Coevolution in defining the functional specificity , 2009, Proteins.

[17]  Dmitrij Frishman,et al.  Correlated Mutations: A Hallmark of Phenotypic Amino Acid Substitutions , 2010, PLoS Comput. Biol..

[18]  Cristina Marino Buslje,et al.  Correction for phylogeny, small number of observations and data redundancy improves the identification of coevolving amino acid pairs using mutual information , 2009, Bioinform..

[19]  R. Aldrich,et al.  Influence of conservation on calculations of amino acid covariance in multiple sequence alignments , 2004, Proteins.

[20]  Richard W. Aldrich,et al.  A perturbation-based method for calculating explicit likelihood of evolutionary co-variance in multiple sequence alignments , 2004, Bioinform..

[21]  Ying Liu,et al.  Analysis of correlated mutations in HIV-1 protease using spectral clustering , 2008, Bioinform..

[22]  L. C. Martin,et al.  Using information theory to search for co-evolving residues in proteins , 2005, Bioinform..

[23]  E. Neher How frequent are correlated changes in families of protein sequences? , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[24]  R. Ranganathan,et al.  Evolutionarily conserved pathways of energetic connectivity in protein families. , 1999, Science.

[25]  Kevin Karplus,et al.  Contact prediction using mutual information and neural nets , 2007, Proteins.

[26]  Najeeb M. Halabi,et al.  Protein Sectors: Evolutionary Units of Three-Dimensional Structure , 2009, Cell.

[27]  R. Aurora,et al.  Genome-wide hepatitis C virus amino acid covariance networks can predict response to antiviral therapy in humans. , 2008, The Journal of clinical investigation.

[28]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[29]  Rob Knight,et al.  Detecting coevolution without phylogenetic trees? Tree-ignorant metrics of coevolution perform as well as tree-aware metrics , 2008, BMC Evolutionary Biology.

[30]  Anders Gorm Pedersen,et al.  Finding coevolving amino acid residues using row and column weighting of mutual information and multi-dimensional amino acid representation , 2007, Algorithms for molecular biology : AMB.

[31]  Erik van Nimwegen,et al.  Disentangling Direct from Indirect Co-Evolution of Residues in Protein Alignments , 2010, PLoS Comput. Biol..

[32]  Chan-seok Jeong,et al.  Linear predictive coding representation of correlated mutation for protein sequence alignment , 2009, BMC Bioinformatics.

[33]  Thomas D. Wu,et al.  Mutation Patterns and Structural Correlates in Human Immunodeficiency Virus Type 1 Protease following Different Protease Inhibitor Treatments , 2003, Journal of Virology.

[34]  Dongsup Kim,et al.  A new method for revealing correlated mutations under the structural and functional constraints in proteins , 2009, Bioinform..

[35]  W. Fitch,et al.  An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution , 1970, Biochemical Genetics.

[36]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[37]  John P. Overington,et al.  Environment‐specific amino acid substitution tables: Tertiary templates and prediction of protein folds , 1992, Protein science : a publication of the Protein Society.

[38]  Michael T. Laub,et al.  Rewiring the Specificity of Two-Component Signal Transduction Systems , 2008, Cell.

[39]  Thomas W. H. Lui,et al.  Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments , 2003, Bioinform..

[40]  Narmada Thanki,et al.  CDD: a Conserved Domain Database for the functional annotation of proteins , 2010, Nucleic Acids Res..

[41]  B. Rost,et al.  Effective use of sequence correlation and conservation in fold recognition. , 1999, Journal of molecular biology.

[42]  Haim Ashkenazy,et al.  Optimal data collection for correlated mutation analysis , 2009, Proteins.

[43]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[44]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[45]  Gaurav Tyagi,et al.  Functionally compensating coevolving positions are neither homoplasic nor conserved in clades. , 2010, Molecular biology and evolution.

[46]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[47]  Raphaël Guerois,et al.  Coevolution at protein complex interfaces can be detected by the complementarity trace with important impact for predictive docking , 2008, Proceedings of the National Academy of Sciences.