Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior.

The degree to which an amino acid site is free to vary is strongly dependent on its structural and functional importance. An amino acid that plays an essential role is unlikely to change over evolutionary time. Hence, the evolutionary rate at an amino acid site is indicative of how conserved this site is and, in turn, allows evaluation of its importance in maintaining the structure/function of the protein. When using probabilistic methods for site-specific rate inference, few alternatives are possible. In this study we use simulations to compare the maximum-likelihood and Bayesian paradigms. We study the dependence of inference accuracy on such parameters as number of sequences, branch lengths, the shape of the rate distribution, and sequence length. We also study the possibility of simultaneously estimating branch lengths and site-specific rates. Our results show that a Bayesian approach is superior to maximum-likelihood under a wide range of conditions, indicating that the prior that is incorporated into the Bayesian computation significantly improves performance. We show that when branch lengths are unknown, it is better first to estimate branch lengths and then to estimate site-specific rates. This procedure was found to be superior to estimating both the branch lengths and site-specific rates simultaneously. Finally, we illustrate the difference between maximum-likelihood and Bayesian methods when analyzing site-conservation for the apoptosis regulator protein Bcl-x(L).

[1]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[2]  L. Jin,et al.  Limitations of the evolutionary parsimony method of phylogenetic analysis. , 1990, Molecular biology and evolution.

[3]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[4]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[5]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[6]  Z. Yang,et al.  A space-time process model for the evolution of DNA sequences. , 1995, Genetics.

[7]  Z. Yang,et al.  Mixed model analysis of DNA sequence evolution. , 1995, Biometrics.

[8]  Z. Yang,et al.  Among-site rate variation and its impact on phylogenetic analyses. , 1996, Trends in ecology & evolution.

[9]  R. Meadows,et al.  Structure of Bcl-xL-Bak Peptide Complex: Recognition Between Regulators of Apoptosis , 1997, Science.

[10]  R. Nielsen,et al.  Site-by-site estimation of the rate of substitution and the correlation of rates in mitochondrial DNA. , 1997, Systematic biology.

[11]  S. Cory,et al.  The conserved N‐terminal BH4 domain of Bcl‐2 homologues is essential for inhibition of apoptosis and interaction with CED‐4 , 1998, The EMBO journal.

[12]  S. Cory,et al.  The Bcl-2 protein family: arbiters of cell survival. , 1998, Science.

[13]  A. von Haeseler,et al.  Pattern of nucleotide substitution and rate heterogeneity in the hypervariable regions I and II of human mtDNA. , 1999, Genetics.

[14]  L. Excoffier,et al.  Substitution rate variation among sites in mitochondrial hypervariable region I of humans and chimpanzees. , 1999, Molecular biology and evolution.

[15]  John S. J. Hsu,et al.  Bayesian Methods: An Analysis for Statisticians and Interdisciplinary Researchers , 1999 .

[16]  Y. Tsujimoto,et al.  BH4 domain of antiapoptotic Bcl-2 family members closes voltage-dependent anion channel and inhibits apoptotic mitochondrial changes and cell death. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Annabel E. Todd,et al.  From structure to function: Approaches and limitations , 2000, Nature Structural Biology.

[18]  T. Jukes,et al.  The neutral theory of molecular evolution. , 2000, Genetics.

[19]  Joseph Felsenstein,et al.  Taking Variation of Evolutionary Rates Between Sites into Account in Inferring Phylogenies , 2001, Journal of Molecular Evolution.

[20]  Jonathan P. Bollback,et al.  Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology , 2001, Science.

[21]  M M Miyamoto,et al.  A likelihood ratio test for evolutionary rate shifts and functional divergence among proteins , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[22]  P. Lewis A likelihood approach to estimating phylogeny from discrete morphological character data. , 2001, Systematic biology.

[23]  P. Lio’,et al.  Molecular phylogenetics: state-of-the-art methods for looking into the past. , 2001, Trends in genetics : TIG.

[24]  Xun Gu,et al.  Predicting functional divergence in protein evolution by site-specific rate shifts. , 2002, Trends in biochemical sciences.

[25]  Masatoshi Nei,et al.  Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Itay Mayrose,et al.  Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[27]  O. Lichtarge,et al.  Evolutionary predictions of binding surfaces and interactions. , 2002, Current opinion in structural biology.

[28]  W. S. Valdar,et al.  Scoring residue conservation , 2002, Proteins.

[29]  An Empirical Analysis of mt 16S rRNA Covarion-Like Evolution in Insects: Site-Specific Rate Variation Is Clustered and Frequently Detected , 2002, Journal of Molecular Evolution.

[30]  Tal Pupko,et al.  A covarion-based method for detecting molecular adaptation: application to the evolution of primate mitochondrial genomes , 2002, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[31]  Y. Inagaki,et al.  Testing for differences in rates-across-sites distributions in phylogenetic subtrees. , 2002, Molecular biology and evolution.

[32]  Nir Ben-Tal,et al.  In Silico Identification of Functional Protein Interfaces , 2003, Comparative and functional genomics.

[33]  P. Gasparini,et al.  Multiple mutations of MYO1A, a cochlear-expressed gene, in sensorineural hearing loss. , 2003, American journal of human genetics.

[34]  B. Honig,et al.  Solution structure of Vibrio cholerae protein VC0424: A variation of the ferredoxin‐like fold , 2003, Protein science : a publication of the Protein Society.

[35]  S. Girirajan,et al.  Contribution of connexin26 (GJB2) mutations and founder effect to non-syndromic hearing loss in India , 2003, Journal of medical genetics.

[36]  A. Valencia,et al.  Automatic methods for predicting functionally important residues. , 2003, Journal of molecular biology.

[37]  E. Chiancone,et al.  Information Transfer in the Penta-EF-hand Protein Sorcin Does Not Operate via the Canonical Structural/Functional Pairing , 2003, Journal of Biological Chemistry.

[38]  L. Kavraki,et al.  An accurate, sensitive, and scalable method to identify functional sites in protein structures. , 2003, Journal of molecular biology.

[39]  Yan Boucher,et al.  Inferring functional constraints and divergence in protein families using 3D mapping of phylogenetic information. , 2003, Nucleic acids research.

[40]  A. von Haeseler,et al.  Identifying site-specific substitution rates. , 2003, Molecular biology and evolution.

[41]  Tal Pupko,et al.  Structural Genomics , 2005 .

[42]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[43]  K. Holsinger,et al.  The effect of topology on estimates of among-site rate variation , 1996, Journal of Molecular Evolution.

[44]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.