Spatial and temporal heterogeneity in nucleotide sequence evolution.

Models of nucleotide substitution make many simplifying assumptions about the evolutionary process, including that the same process acts on all sites in an alignment and on all branches on the phylogenetic tree. Many studies have shown that in reality the substitution process is heterogeneous and that this variability can introduce systematic errors into many forms of phylogenetic analyses. I propose a new rigorous approach for describing heterogeneity called a temporal hidden Markov model (THMM), which can distinguish between among site (spatial) heterogeneity and among lineage (temporal) heterogeneity. Several versions of the THMM are applied to 16 sets of aligned sequences to quantitatively assess the different forms of heterogeneity acting within them. The most general THMM provides the best fit in all the data sets examined, providing strong evidence of pervasive heterogeneity during evolution. Investigating individual forms of heterogeneity provides further insights. In agreement with previous studies, spatial rate heterogeneity (rates across sites [RAS]) is inferred to be the single most prevalent form of heterogeneity. Interestingly, RAS appears so dominant that failure to independently include it in the THMM masks other forms of heterogeneity, particularly temporal heterogeneity. Incorporating RAS into the THMM reveals substantial temporal and spatial heterogeneity in nucleotide composition and bias toward transition substitution in all alignments examined, although the relative importance of different forms of heterogeneity varies between data sets. Furthermore, the improvements in model fit observed by adding complexity to the model suggest that the THMMs used in this study do not capture all the evolutionary heterogeneity occurring in the data. These observations all indicate that current tests may consistently underestimate the degree of temporal heterogeneity occurring in data. Finally, there is a weak link between the amount of heterogeneity detected and the level of divergence between the sequences, suggesting that variability in the evolutionary process will be a particular problem for deep phylogeny.

[1]  Carolin Kosiol,et al.  Phylogenetics by likelihood: Evolutionary modeling as a tool for understanding the genome , 2006, J. Biomed. Informatics.

[2]  K. Liang,et al.  Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard Conditions , 1987 .

[3]  Z. Yang,et al.  Among-site rate variation and its impact on phylogenetic analyses. , 1996, Trends in ecology & evolution.

[4]  M. Gouy,et al.  Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. , 1998, Molecular biology and evolution.

[5]  Z. Yang,et al.  On the use of nucleic acid sequences to infer early branchings in the tree of life. , 1995, Molecular biology and evolution.

[6]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[7]  John A Rhodes,et al.  The Identifiability of Covarion Models in Phylogenetics , 2009, TCBB.

[8]  Ziheng Yang Estimating the pattern of nucleotide substitution , 1994, Journal of Molecular Evolution.

[9]  M. Donoghue,et al.  Recreating a functional ancestral archosaur visual pigment. , 2002, Molecular biology and evolution.

[10]  Simon Whelan,et al.  New approaches to phylogenetic tree search and their application to large numbers of protein alignments. , 2007, Systematic biology.

[11]  S. Whelan,et al.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. , 2001, Molecular biology and evolution.

[12]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[13]  Ziheng Yang,et al.  Statistical methods for detecting molecular adaptation , 2000, Trends in Ecology & Evolution.

[14]  H. Akaike A new look at the statistical model identification , 1974 .

[15]  David C. Jones,et al.  Combining protein evolution and secondary structure. , 1996, Molecular biology and evolution.

[16]  P. Lio’,et al.  Molecular phylogenetics: state-of-the-art methods for looking into the past. , 2001, Trends in genetics : TIG.

[17]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[18]  Masami Hasegawa,et al.  Ribosomal RNA trees misleading? , 1993, Nature.

[19]  M. Pagel,et al.  A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. , 2004, Systematic biology.

[20]  Z. Yang,et al.  Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. , 1998, Molecular biology and evolution.

[21]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[22]  J. Huelsenbeck Testing a covariotide model of DNA substitution. , 2002, Molecular biology and evolution.

[23]  H. Philippe,et al.  Heterotachy, an important process of protein evolution. , 2002, Molecular biology and evolution.

[24]  Simon Whelan,et al.  Distributions of statistics used for the comparison of models of sequence evolution in phylogenetics , 1999 .

[25]  Joshua T Herbeck,et al.  Nonhomogeneous model of sequence evolution indicates independent origins of primary endosymbionts within the enterobacteriales (gamma-Proteobacteria). , 2005, Molecular biology and evolution.

[26]  N. Galtier,et al.  Maximum-likelihood phylogenetic analysis under a covarion-like model. , 2001, Molecular biology and evolution.

[27]  Mike Steel,et al.  Phylogenetic mixtures on a single tree can mimic a tree of another topology. , 2007, Systematic biology.

[28]  Bryan Kolaczkowski,et al.  Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous , 2004, Nature.

[29]  J. Hartigan,et al.  Asynchronous distance between homologous DNA sequences. , 1987, Biometrics.

[30]  Nicolas Rodriguez,et al.  PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees , 2005, Nucleic Acids Res..

[31]  H. Philippe,et al.  A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. , 2004, Molecular biology and evolution.

[32]  M. Steel,et al.  Modeling the covarion hypothesis of nucleotide substitution. , 1998, Mathematical biosciences.

[33]  Orkun S. Soyer,et al.  Predicting functional sites in proteins: site-specific evolutionary models and their application to neurotransmitter transporters. , 2004, Journal of molecular biology.

[34]  M. Steel,et al.  Difficulties in testing for covarion-like properties of sequences under the confounding influence of changing proportions of variable sites. , 2008, Molecular biology and evolution.

[35]  J. G. Burleigh,et al.  Covarion structure in plastid genome evolution: a new statistical test. , 2005, Molecular biology and evolution.

[36]  Stéphane Guindon,et al.  Modeling the site-specific variation of selection patterns along lineages. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Z. Yang,et al.  A space-time process model for the evolution of DNA sequences. , 1995, Genetics.

[38]  M. Gouy,et al.  A nonhyperthermophilic common ancestor to extant life forms. , 1999, Science.

[39]  Simon A. A. Travers,et al.  A Novel Method for Detecting Intramolecular Coevolution: Adding a Further Dimension to Selective Constraints Analyses , 2006, Genetics.

[40]  Edward Susko,et al.  Testing for covarion-like evolution in protein sequences. , 2007, Molecular biology and evolution.

[41]  H. Philippe,et al.  Assessing site-interdependent phylogenetic models of sequence evolution. , 2006, Molecular biology and evolution.

[42]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..