Bayesian phylogenetic model selection using reversible jump Markov chain Monte Carlo.

A common problem in molecular phylogenetics is choosing a model of DNA substitution that does a good job of explaining the DNA sequence alignment without introducing superfluous parameters. A number of methods have been used to choose among a small set of candidate substitution models, such as the likelihood ratio test, the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC), and Bayes factors. Current implementations of any of these criteria suffer from the limitation that only a small set of models are examined, or that the test does not allow easy comparison of non-nested models. In this article, we expand the pool of candidate substitution models to include all possible time-reversible models. This set includes seven models that have already been described. We show how Bayes factors can be calculated for these models using reversible jump Markov chain Monte Carlo, and apply the method to 16 DNA sequence alignments. For each data set, we compare the model with the best Bayes factor to the best models chosen using AIC and BIC. We find that the best model under any of these criteria is not necessarily the most complicated one; models with an intermediate number of substitution types typically do best. Moreover, almost all of the models that are chosen as best do not constrain a transition rate to be the same as a transversion rate, suggesting that it is the transition/transversion rate bias that plays the largest role in determining which models are selected. Importantly, the reversible jump Markov chain Monte Carlo algorithm described here allows estimation of phylogeny (and other phylogenetic model parameters) to be performed while accounting for uncertainty in the model of DNA substitution.

[1]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[2]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[3]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[4]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[5]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[6]  Nick Goldman,et al.  Statistical tests of models of DNA substitution , 1993, Journal of Molecular Evolution.

[7]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[8]  M. Braun,et al.  True and false gharials: a nuclear gene phylogeny of crocodylia. , 2003, Systematic biology.

[9]  D. Posada Using MODELTEST and PAUP* to Select a Model of Nucleotide Substitution , 2003, Current protocols in bioinformatics.

[10]  J. Huelsenbeck,et al.  Potential applications and pitfalls of Bayesian inference of phylogeny. , 2002, Systematic biology.

[11]  W. Pearson,et al.  Current Protocols in Bioinformatics , 2002 .

[12]  D. Bellwood,et al.  EVOLUTIONARY HISTORY OF THE PARROTFISHES: BIOGEOGRAPHY, ECOMORPHOLOGY, AND COMPARATIVE DIVERSITY , 2002, Evolution; international journal of organic evolution.

[13]  D. Swofford PAUP*: Phylogenetic analysis using parsimony (*and other methods), Version 4.0b10 , 2002 .

[14]  S. J. Arnold,et al.  Molecular systematics and evolution of Regina and the thamnophiine snakes. , 2001, Molecular phylogenetics and evolution.

[15]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[16]  M. Suchard,et al.  Bayesian selection of continuous-time Markov chain evolutionary models. , 2001, Molecular biology and evolution.

[17]  A. Brower Phylogenetic relationships among the Nymphalidae (Lepidoptera) inferred from partial sequences of the wingless gene , 2000, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[18]  N. Goldman,et al.  Codon-substitution models for heterogeneous selection pressure at amino acid sites. , 2000, Genetics.

[19]  M. P. Cummings,et al.  PAUP* Phylogenetic analysis using parsimony (*and other methods) Version 4 , 2000 .

[20]  B. Hall,et al.  Phylogenetic relationships among ascomycetes: evidence from an RNA polymerse II subunit. , 1999, Molecular biology and evolution.

[21]  M. Donoghue,et al.  The root of angiosperm phylogeny inferred from duplicate phytochrome genes. , 1999, Science.

[22]  Kevin de Queiroz,et al.  Phylogenetic Relationships and Tempo of Early Diversification in Anolis Lizards , 1999 .

[23]  M. Schervish,et al.  Bayes Factors: What They are and What They are Not , 1999 .

[24]  B. Larget,et al.  Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees , 2000 .

[25]  J. Huelsenbeck,et al.  Base compositional bias and phylogenetic analyses: a test of the "flying DNA" hypothesis. , 1998, Molecular phylogenetics and evolution.

[26]  David Posada,et al.  MODELTEST: testing the model of DNA substitution , 1998, Bioinform..

[27]  A. Janke,et al.  Phylogenetic analyses of mitochondrial DNA suggest a sister group relationship between Xenarthra (Edentata) and Ferungulates. , 1997, Molecular biology and evolution.

[28]  B. Rannala,et al.  Phylogenetic methods come of age: testing hypotheses in an evolutionary context. , 1997, Science.

[29]  N. Pace,et al.  Perspectives on archaeal diversity, thermophily and monophyly from environmental rRNA sequences. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Adrian E. Raftery,et al.  Hypothesis testing and model selection , 1996 .

[31]  Walter R. Gilks,et al.  Hypothesis testing and model selection , 1995 .

[32]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[33]  L. Wasserman,et al.  Computing Bayes Factors Using a Generalization of the Savage-Dickey Density Ratio , 1995 .

[34]  M. Hasegawa,et al.  Phylogeny of whales: dependence of the inference on species sampling. , 1995, Molecular biology and evolution.

[35]  Adrian E. Raftery,et al.  Bayes factors and model uncertainty , 1995 .

[36]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[37]  A. von Haeseler,et al.  A stochastic model for the evolution of autocorrelated DNA sequences. , 1994, Molecular phylogenetics and evolution.

[38]  S. Muse,et al.  A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. , 1994, Molecular biology and evolution.

[39]  N. Goldman,et al.  A codon-based model of nucleotide substitution for protein-coding DNA sequences. , 1994, Molecular biology and evolution.

[40]  S. Nadler,et al.  Disparate rates of molecular evolution in cospeciating hosts and parasites. , 1994, Science.

[41]  M. Nei,et al.  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. , 1993, Molecular biology and evolution.

[42]  David R. Anderson,et al.  Modeling Survival and Testing Biological Hypotheses Using Marked Animals: A Unified Approach with Case Studies , 1992 .

[43]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[44]  T Gojobori,et al.  Molecular phylogeny and evolution of primate mitochondrial DNA. , 1988, Molecular biology and evolution.

[45]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[46]  Joseph Felsenstein,et al.  DISTANCE METHODS FOR INFERRING PHYLOGENIES: A JUSTIFICATION , 1984, Evolution; international journal of organic evolution.

[47]  H. Kishino,et al.  A New Molecular Clock of Mitochondrial DNA and the Evolution of Hominoids , 1984 .

[48]  M. Kimura Estimation of evolutionary distances between homologous nucleotide sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[49]  S. Jeffery Evolution of Protein Molecules , 1979 .

[50]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[51]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[52]  H. Akaike INFORMATION THEORY AS AN EXTENSION OF THE MAXIMUM LIKELIHOOD , 1973 .

[53]  M. O. Dayhoff Evolution of proteins. , 1971 .

[54]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[55]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[56]  V. Ingram The evolution of a protein. , 1962, Federation proceedings.

[57]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[58]  L. M. M.-T. Theory of Probability , 1929, Nature.