Estimation of Phylogeny Using a General Markov Model

The non-homogeneous model of nucleotide substitution proposed by Barry and Hartigan (Stat Sci, 2: 191–210) is the most general model of DNA evolution assuming an independent and identical process at each site. We present a computational solution for this model, and use it to analyse two data sets, each violating one or more of the assumptions of stationarity, homogeneity, and reversibility. The log likelihood values returned by programs based on the F84 model (J Mol Evol, 29: 170–179), the general time reversible model (J Mol Evol, 20: 86–93), and Barry and Hartigan’s model are compared to determine the validity of the assumptions made by the first two models. In addition, we present a method for assessing whether sequences have evolved under reversible conditions and discover that this is not so for the two data sets. Finally, we determine the most likely tree under the three models of DNA evolution and compare these with the one favoured by the tests for symmetry.

[1]  J. S. Rogers,et al.  Multiple local maxima for likelihoods of phylogenetic trees: a simulation study. , 1999, Molecular biology and evolution.

[2]  Masami Hasegawa,et al.  CONSEL: for assessing the confidence of phylogenetic tree selection , 2001, Bioinform..

[3]  L. Duret,et al.  GC-content evolution in mammalian genomes: the biased gene conversion hypothesis. , 2001, Genetics.

[4]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[5]  A. Eyre-Walker,et al.  Synonymous codon bias is not caused by mutation bias in G+C-rich genes in humans. , 2001, Molecular biology and evolution.

[6]  H. Kishino,et al.  Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea , 1989, Journal of Molecular Evolution.

[7]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[8]  S. Ho,et al.  Tracing the decay of the historical signal in biological sequence data. , 2004, Systematic biology.

[9]  Ziheng Yang Estimating the pattern of nucleotide substitution , 1994, Journal of Molecular Evolution.

[10]  M. P. Cummings,et al.  PAUP* Phylogenetic analysis using parsimony (*and other methods) Version 4 , 2000 .

[11]  Faisal Ababneh,et al.  Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences , 2006, Bioinform..

[12]  Sergei L. Kosakovsky Pond,et al.  HyPhy: hypothesis testing using phylogenies , 2005, Bioinform..

[13]  Magnus Rattray,et al.  RNA-based phylogenetic methods: application to mammalian mitochondrial RNA sequences. , 2003, Molecular phylogenetics and evolution.

[14]  P. Goloboff Analyzing Large Data Sets in Reasonable Times: Solutions for Composite Optima , 1999, Cladistics : the international journal of the Willi Hennig Society.

[15]  J. Hartigan,et al.  Statistical Analysis of Hominoid Molecular Evolution , 1987 .

[16]  G. Serio,et al.  A new method for calculating evolutionary substitution rates , 2005, Journal of Molecular Evolution.

[17]  A. Bowker,et al.  A test for symmetry in contingency tables. , 1948, Journal of the American Statistical Association.

[18]  Ramakant Sharma,et al.  Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood , 2003 .

[19]  D. Pearl,et al.  Stochastic search strategy for estimation of maximum likelihood phylogenetic trees. , 2001, Systematic biology.

[20]  J. Oliver,et al.  The general stochastic model of nucleotide substitution. , 1990, Journal of theoretical biology.

[21]  D. Swofford PAUP*: Phylogenetic analysis using parsimony (*and other methods), Version 4.0b10 , 2002 .

[22]  S. Easteal,et al.  Departure from neutrality at the mitochondrial NADH dehydrogenase subunit 2 gene in humans, but not in chimpanzees. , 1998, Genetics.

[23]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[24]  P. Lewis,et al.  A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. , 1998, Molecular biology and evolution.

[25]  Brendan D. McKay,et al.  TrExML: a maximum-likelihood approach for extensive tree-space exploration , 2000, Bioinform..

[26]  J. Hartigan,et al.  Asynchronous distance between homologous DNA sequences. , 1987, Biometrics.

[27]  A. Eyre-Walker,et al.  Evidence of selection on silent site base composition in mammals: potential implications for the evolution of isochores and junk DNA. , 1999, Genetics.

[28]  A. Stuart A TEST FOR HOMOGENEITY OF THE MARGINAL DISTRIBUTIONS IN A TWO-WAY CLASSIFICATION , 1955 .

[29]  M. Gouy,et al.  Inferring phylogenies from DNA sequences of unequal base compositions. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Faisal Ababneh,et al.  Generation of the Exact Distribution and Simulation of Matched Nucleotide Sequences on a Phylogenetic Tree , 2006, J. Math. Model. Algorithms.

[31]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[32]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[33]  Z. Yang,et al.  On the use of nucleic acid sequences to infer early branchings in the tree of life. , 1995, Molecular biology and evolution.

[34]  Peter G Foster,et al.  Modeling compositional heterogeneity. , 2004, Systematic biology.

[35]  D. Penny,et al.  Branch and bound algorithms to determine minimal evolutionary trees , 1982 .

[36]  Hidetoshi Shimodaira,et al.  Multiple Comparisons of Log-Likelihoods with Applications to Phylogenetic Inference , 1999, Molecular Biology and Evolution.

[37]  David Bryant,et al.  Likelihood calculation in molecular phylogenetics , 2007, Mathematics of Evolution and Phylogeny.

[38]  Faisal Ababneh,et al.  The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. , 2004, Systematic biology.

[39]  M. Steel Recovering a tree from the leaf colourations it generates under a Markov model , 1994 .

[40]  Olivier Gascuel,et al.  Mathematics of Evolution and Phylogeny , 2005 .

[41]  R. Gupta,et al.  The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes. , 2000, FEMS microbiology reviews.

[42]  M. Steel,et al.  Recovering evolutionary trees under a more realistic model of sequence evolution. , 1994, Molecular biology and evolution.

[43]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[44]  C. Noviello,et al.  Catarrhine primate divergence dates estimated from complete mitochondrial genomes: concordance with fossil and nuclear DNA evidence. , 2005, Journal of human evolution.

[45]  Hidetoshi Shimodaira An approximately unbiased test of phylogenetic tree selection. , 2002, Systematic biology.

[46]  Martin Vingron,et al.  TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing , 2002, Bioinform..

[47]  Barbara R. Holland,et al.  Multiple maxima of likelihood in phylogenetic trees: an analytic approach , 2000, RECOMB '00.