An evolutionary space-time model with varying among-site dependencies.

It is now widely accepted that sites in a protein do not undergo independent evolutionary processes. The underlying assumption is that proteins are composed of conserved and variable linear domains, and thus rates at neighboring sites are correlated. In this paper, we comprehensively examine the performance of an autocorrelation model of evolutionary rates in protein sequences. We further develop a model in which the level of correlation between rates at adjacent sites is not equal at all sites of the protein. High correlation is expected, for example, in linear functional domains. On the other hand, when we consider nonlinear functional regions (e.g., active sites), low correlation is expected because the interaction between distant sites imposes independence of rates in the linear sequence. Our model is based on a hidden Markov model, which accounts for autocorrelation at certain regions of the protein and rate independence at others. We study the differences between the novel model and models which assume either independence or a fixed level of dependence throughout the protein. Using a diverse set of protein data sets we show that the novel model better fits most data sets. We further analyze the potassium-channel protein family and illustrate the relationship between the dependence of rates at adjacent sites and the tertiary structure of the protein.

[1]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[2]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[3]  David T. Jones,et al.  Protein evolution with dependence among codons due to tertiary structure. , 2003, Molecular biology and evolution.

[4]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[5]  Z. Yang,et al.  Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. , 2001, Molecular biology and evolution.

[6]  Arne Elofsson,et al.  Tertiary Windowing to Detect Positive Diversifying Selection , 2005, Journal of Molecular Evolution.

[7]  Itay Mayrose,et al.  Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[8]  J. Ringo Fundamental Genetics: Molecular Evolution and Phylogeny , 2004 .

[9]  W. Li,et al.  Maximum likelihood estimation of the heterogeneity of substitution rate among nucleotide sites. , 1995, Molecular biology and evolution.

[10]  Joseph Felsenstein,et al.  Taking Variation of Evolutionary Rates Between Sites into Account in Inferring Phylogenies , 2001, Journal of Molecular Evolution.

[11]  Z. Yang,et al.  Among-site rate variation and its impact on phylogenetic analyses. , 1996, Trends in ecology & evolution.

[12]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[13]  Christopher Miller An overview of the potassium channel family , 2000, Genome Biology.

[14]  Yoshiyuki Suzuki,et al.  Three-dimensional window analysis for detecting positive selection at structural regions of proteins. , 2004, Molecular biology and evolution.

[15]  B. Chait,et al.  The structure of the potassium channel: molecular basis of K+ conduction and selectivity. , 1998, Science.

[16]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[17]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[18]  P. Lio’,et al.  Models of molecular evolution and phylogeny. , 1998, Genome research.

[19]  T. Pupko,et al.  Site-Specific Evolutionary Rate Inference: Taking Phylogenetic Uncertainty into Account , 2005, Journal of Molecular Evolution.

[20]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[21]  Itay Mayrose,et al.  A Gamma mixture model better accounts for among site rate heterogeneity , 2005, ECCB/JBI.

[22]  D. Haussler,et al.  Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. , 2003, Molecular biology and evolution.

[23]  Zhengyuan O. Wang,et al.  Context dependence and coevolution among amino acid residues in proteins. , 2005, Methods in enzymology.

[24]  Santiago F. Elena,et al.  A Sliding Window-Based Method to Detect Selective Constraints in Protein-Coding Genes and Its Application to RNA Viruses , 2002, Journal of Molecular Evolution.

[25]  D. Bryant,et al.  Site interdependence attributed to tertiary structure in amino acid sequence evolution. , 2005, Gene.

[26]  D. Swofford,et al.  The Effect of Taxon Sampling on Estimating Rate Heterogeneity Parameters of Maximum-Likelihood Models , 1999 .

[27]  S. Korn,et al.  Potassium channels , 2005, IEEE Transactions on NanoBioscience.

[28]  Z. Yang,et al.  A space-time process model for the evolution of DNA sequences. , 1995, Genetics.

[29]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[30]  Youxing Jiang,et al.  The open pore conformation of potassium channels , 2002, Nature.

[31]  N. Ben-Tal,et al.  Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. , 2004, Molecular biology and evolution.

[32]  H. Akaike A new look at the statistical model identification , 1974 .

[33]  M. Sternberg,et al.  Automated structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking. , 2001, Journal of molecular biology.

[34]  Z. Yang,et al.  Mixed model analysis of DNA sequence evolution. , 1995, Biometrics.