Two stationary nonhomogeneous Markov models of nucleotide sequence evolution.

The general Markov model (GMM) of nucleotide substitution does not assume the evolutionary process to be stationary, reversible, or homogeneous. The GMM can be simplified by assuming the evolutionary process to be stationary. A stationary GMM is appropriate for analyses of phylogenetic data sets that are compositionally homogeneous; a data set is considered to be compositionally homogeneous if a statistical test does not detect significant differences in the marginal distributions of the sequences. Though the general time-reversible (GTR) model assumes stationarity, it also assumes reversibility and homogeneity. We propose two new stationary and nonhomogeneous models--one constrains the GMM to be reversible, whereas the other does not. The two models, coupled with the GTR model, comprise a set of nested models that can be used to test the assumptions of reversibility and homogeneity for stationary processes. The two models are extended to incorporate invariable sites and used to analyze a seven-taxon hominoid data set that displays compositional homogeneity. We show that within the class of stationary models, a nonhomogeneous model fits the hominoid data better than the GTR model. We note that if one considers a wider set of models that are not constrained to be stationary, then an even better fit can be obtained for the hominoid data. However, the methods for reducing model complexity from an extremely large set of nonstationary models are yet to be developed.

[1]  Sun-Yuan Kung,et al.  Biometric Authentication: A Machine Learning Approach , 2004 .

[2]  Faisal Ababneh,et al.  Phylogenetic model evaluation. , 2008, Methods in molecular biology.

[3]  G. Serio,et al.  A new method for calculating evolutionary substitution rates , 2005, Journal of Molecular Evolution.

[4]  J. Kingman The imbedding problem for finite Markov chains , 1962 .

[5]  A. Bowker,et al.  A test for symmetry in contingency tables. , 1948, Journal of the American Statistical Association.

[6]  Arndt von Haeseler,et al.  Testing substitution models within a phylogenetic tree. , 2003, Molecular biology and evolution.

[7]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[8]  Hidetoshi Shimodaira An approximately unbiased test of phylogenetic tree selection. , 2002, Systematic biology.

[9]  A. Stuart A TEST FOR HOMOGENEITY OF THE MARGINAL DISTRIBUTIONS IN A TWO-WAY CLASSIFICATION , 1955 .

[10]  Faisal Ababneh,et al.  Generation of the Exact Distribution and Simulation of Matched Nucleotide Sequences on a Phylogenetic Tree , 2006, J. Math. Model. Algorithms.

[11]  David R. Anderson,et al.  Multimodel Inference , 2004 .

[12]  J. Dutheil,et al.  Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs , 2008, BMC Evolutionary Biology.

[13]  A. Kolmogoroff Zur Theorie der Markoffschen Ketten , 1936 .

[14]  Masami Hasegawa,et al.  CONSEL: for assessing the confidence of phylogenetic tree selection , 2001, Bioinform..

[15]  M. Gouy,et al.  A nonhyperthermophilic common ancestor to extant life forms. , 1999, Science.

[16]  Peter G Foster,et al.  Modeling compositional heterogeneity. , 2004, Systematic biology.

[17]  M. Gouy,et al.  Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. , 1998, Molecular biology and evolution.

[18]  C. Noviello,et al.  Catarrhine primate divergence dates estimated from complete mitochondrial genomes: concordance with fossil and nuclear DNA evidence. , 2005, Journal of human evolution.

[19]  Z. Yang,et al.  On the use of nucleic acid sequences to infer early branchings in the tree of life. , 1995, Molecular biology and evolution.

[20]  J. Hartigan,et al.  Statistical Analysis of Hominoid Molecular Evolution , 1987 .

[21]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[22]  Vivek Jayaswal,et al.  Estimation of phylogeny and invariant sites under the general Markov model of nucleotide sequence evolution. , 2007, Systematic biology.

[23]  Faisal Ababneh,et al.  Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences , 2006, Bioinform..

[24]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[25]  John Robinson,et al.  Estimation of Phylogeny Using a General Markov Model , 2005, Evolutionary bioinformatics online.

[26]  H. Akaike A new look at the statistical model identification , 1974 .