Accurate model selection of relaxed molecular clocks in bayesian phylogenetics.

Recent implementations of path sampling (PS) and stepping-stone sampling (SS) have been shown to outperform the harmonic mean estimator (HME) and a posterior simulation-based analog of Akaike's information criterion through Markov chain Monte Carlo (AICM), in bayesian model selection of demographic and molecular clock models. Almost simultaneously, a bayesian model averaging approach was developed that avoids conditioning on a single model but averages over a set of relaxed clock models. This approach returns estimates of the posterior probability of each clock model through which one can estimate the Bayes factor in favor of the maximum a posteriori (MAP) clock model; however, this Bayes factor estimate may suffer when the posterior probability of the MAP model approaches 1. Here, we compare these two recent developments with the HME, stabilized/smoothed HME (sHME), and AICM, using both synthetic and empirical data. Our comparison shows reassuringly that MAP identification and its Bayes factor provide similar performance to PS and SS and that these approaches considerably outperform HME, sHME, and AICM in selecting the correct underlying clock model. We also illustrate the importance of using proper priors on a large set of empirical data sets.

[1]  Z. Yang,et al.  Among-site rate variation and its impact on phylogenetic analyses. , 1996, Trends in ecology & evolution.

[2]  M. Newton Approximate Bayesian-inference With the Weighted Likelihood Bootstrap , 1994 .

[3]  G. Yule,et al.  A Mathematical Theory of Evolution, Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .

[4]  A. Pettitt,et al.  Marginal likelihood estimation via power posteriors , 2008 .

[5]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[6]  Y. Ogata A Monte Carlo method for high dimensional integration , 1989 .

[7]  D. Spiegelhalter,et al.  Bayes Factors for Linear and Log‐Linear Models with Vague Prior Information , 1982 .

[8]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[9]  H. Philippe,et al.  Computing Bayes factors using thermodynamic integration. , 2006, Systematic biology.

[10]  M. Suchard,et al.  Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. , 2012, Molecular biology and evolution.

[11]  Robert M. Miura,et al.  Some mathematical questions in biology : DNA sequence analysis , 1986 .

[12]  B. Carlin,et al.  Bayesian Model Choice Via Markov Chain Monte Carlo Methods , 1995 .

[13]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[14]  Tanja Gernhard,et al.  The conditioned reconstructed process. , 2008, Journal of theoretical biology.

[15]  M. Suchard,et al.  Bayesian Phylogenetics with BEAUti and the BEAST 1.7 , 2012, Molecular biology and evolution.

[16]  Ming-Hui Chen,et al.  Improving marginal likelihood estimation for Bayesian phylogenetic model selection. , 2011, Systematic biology.

[17]  G. Yule,et al.  A Mathematical Theory of Evolution Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .

[18]  Xiao-Li Meng,et al.  Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling , 1998 .

[19]  M. Suchard,et al.  Models for Estimating Bayes Factors with Applications to Phylogeny and Tests of Monophyly , 2005, Biometrics.

[20]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[21]  R. Welsch,et al.  The Hat Matrix in Regression and ANOVA , 1978 .

[22]  M. Suchard,et al.  Joint Bayesian estimation of alignment and phylogeny. , 2005, Systematic biology.

[23]  D. Madigan,et al.  Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window , 1994 .

[24]  Wai Lok Sibon Li,et al.  Model Averaging and Bayes Factor Calculation of Relaxed Molecular Clocks in Bayesian Phylogenetics , 2011, Molecular biology and evolution.

[25]  M. Newton,et al.  Estimating the Integrated Likelihood via Posterior Simulation Using the Harmonic Mean Identity , 2006 .