Evaluating the Adequacy of Molecular Clock Models Using Posterior Predictive Simulations

Abstract Molecular clock models are commonly used to estimate evolutionary rates and timescales from nucleotide sequences. The goal of these models is to account for rate variation among lineages, such that they are assumed to be adequate descriptions of the processes that generated the data. A common approach for selecting a clock model for a data set of interest is to examine a set of candidates and to select the model that provides the best statistical fit. However, this can lead to unreliable estimates if all the candidate models are actually inadequate. For this reason, a method of evaluating absolute model performance is critical. We describe a method that uses posterior predictive simulations to assess the adequacy of clock models. We test the power of this approach using simulated data and find that the method is sensitive to bias in the estimates of branch lengths, which tends to occur when using underparameterized clock models. We also compare the performance of the multinomial test statistic, originally developed to assess the adequacy of substitution models, but find that it has low power in identifying the adequacy of clock models. We illustrate the performance of our method using empirical data sets from coronaviruses, simian immunodeficiency virus, killer whales, and marine turtles. Our results indicate that methods of investigating model adequacy, including the one proposed here, should be routinely used in combination with traditional model selection in evolutionary studies. This will reveal whether a broader range of clock models to be considered in phylogenetic analysis.

[1]  Sergei L. Kosakovsky Pond,et al.  A Case for the Ancient Origin of Coronaviruses , 2013, Journal of Virology.

[2]  S. Ho The changing face of the molecular evolutionary clock. , 2014, Trends in ecology & evolution.

[3]  W. Bruno,et al.  Performance of a divergence time estimation method under a probabilistic model of rate evolution. , 2001, Molecular biology and evolution.

[4]  Jonathan P. Bollback,et al.  Bayesian model adequacy and choice in phylogenetics. , 2002, Molecular biology and evolution.

[5]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[6]  Jeremy M. Brown,et al.  Detection of implausible phylogenetic inferences using posterior predictive assessment of model fit. , 2014, Systematic biology.

[7]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[8]  M. Sanderson,et al.  Relaxed molecular clocks, the bias-variance trade-off, and the quality of phylogenetic inference. , 2010, Systematic biology.

[9]  John Gatesy,et al.  A tenth crucial question regarding model use in phylogenetics. , 2007, Trends in ecology & evolution.

[10]  Ming-Hui Chen,et al.  Improving marginal likelihood estimation for Bayesian phylogenetic model selection. , 2011, Systematic biology.

[11]  H. Shaffer,et al.  Conflicting mitochondrial and nuclear phylogenies for the widely disjunct Emys (Testudines: Emydidae) species complex, and what they tell us about biogeography and hybridization. , 2009, Systematic biology.

[12]  Peter G Foster,et al.  Modeling compositional heterogeneity. , 2004, Systematic biology.

[13]  R. Lanfear,et al.  The impact of calibration and clock-model choice on molecular estimates of divergence times. , 2014, Molecular phylogenetics and evolution.

[14]  H. Kishino,et al.  Estimating the rate of evolution of the rate of molecular evolution. , 1998, Molecular biology and evolution.

[15]  Emily C. Moriarty,et al.  The importance of proper model assumption in bayesian phylogenetics. , 2004, Systematic biology.

[16]  Jeremy M. Brown,et al.  PuMA: Bayesian analysis of partitioned (and unpartitioned) model adequacy , 2009, Bioinform..

[17]  M. Benton,et al.  Paleontological evidence to date the tree of life. , 2006, Molecular biology and evolution.

[18]  B. Rannala,et al.  Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models. , 2004, Systematic biology.

[19]  D. Cutler,et al.  Estimating divergence times in the presence of an overdispersed molecular clock. , 2000, Molecular biology and evolution.

[20]  Joel O. Wertheim,et al.  Dating the Age of the SIV Lineages That Gave Rise to HIV-1 and HIV-2 , 2009, PLoS Comput. Biol..

[21]  Jonathan P. Bollback,et al.  Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology , 2001, Science.

[22]  Nick Goldman,et al.  Statistical tests of models of DNA substitution , 1993, Journal of Molecular Evolution.

[23]  Alex Dornburg,et al.  Relaxed clocks and inferences of heterogeneous patterns of nucleotide substitution and divergence time estimates across whales and dolphins (Mammalia: Cetacea). , 2012, Molecular biology and evolution.

[24]  Sebastián Duchêne,et al.  Simulating and detecting autocorrelation of molecular evolutionary rates among lineages , 2015, Molecular ecology resources.

[25]  Hervé Philippe,et al.  Computational methods for evaluating phylogenetic models of coding sequence evolution with dependence between codons. , 2009, Molecular biology and evolution.

[26]  Wai Lok Sibon Li,et al.  Accurate model selection of relaxed molecular clocks in bayesian phylogenetics. , 2012, Molecular biology and evolution.

[27]  David Posada,et al.  MtArt: a new model of amino acid replacement for Arthropoda. , 2006, Molecular biology and evolution.

[28]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[29]  Ziheng Yang,et al.  Exploring uncertainty in the calibration of the molecular clock , 2012, Biology Letters.

[30]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .

[31]  Jonathan Romiguier,et al.  Efficient selection of branch-specific models of sequence evolution. , 2012, Molecular biology and evolution.

[32]  S. Jeffery Evolution of Protein Molecules , 1979 .

[33]  Joseph Felsenstein,et al.  Statistical inference of phylogenies , 1983 .

[34]  Alexei J. Drummond,et al.  Calibrated Birth–Death Phylogenetic Time-Tree Priors for Bayesian Inference , 2013, Systematic biology.

[35]  Klaus Peter Schliep,et al.  phangorn: phylogenetic analysis in R , 2010, Bioinform..

[36]  W. Murphy,et al.  Resolution of the Early Placental Mammal Radiation Using Bayesian Phylogenetics , 2001, Science.

[37]  H. Philippe,et al.  Computing Bayes factors using thermodynamic integration. , 2006, Systematic biology.

[38]  Ming-Hui Chen,et al.  Posterior predictive Bayesian phylogenetic model selection. , 2014, Systematic biology.

[39]  B. Rannala,et al.  Molecular phylogenetics: principles and practice , 2012, Nature Reviews Genetics.

[40]  Xiao-Li Meng,et al.  Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling , 1998 .

[41]  O. Pybus,et al.  Bayesian coalescent inference of past population dynamics from molecular sequences. , 2005, Molecular biology and evolution.

[42]  Jack Sullivan,et al.  Assessment of substitution model adequacy using frequentist and Bayesian methods. , 2010, Molecular biology and evolution.

[43]  A. Baker,et al.  A mitogenomic timescale for birds detects variable phylogenetic rates of molecular evolution and refutes the standard molecular clock. , 2006, Molecular biology and evolution.

[44]  Kwok-Hung Chan,et al.  Discovery of Seven Novel Mammalian and Avian Coronaviruses in the Genus Deltacoronavirus Supports Bat Coronaviruses as the Gene Source of Alphacoronavirus and Betacoronavirus and Avian Coronaviruses as the Gene Source of Gammacoronavirus and Deltacoronavirus , 2012, Journal of Virology.

[45]  Z. Yang,et al.  Estimation of primate speciation dates using local molecular clocks. , 2000, Molecular biology and evolution.

[46]  John P Huelsenbeck,et al.  A dirichlet process prior for estimating lineage-specific substitution rates. , 2012, Molecular biology and evolution.

[47]  Jeremy M. Brown,et al.  Poor fit to the multispecies coalescent is widely detectable in empirical data. , 2014, Systematic biology.

[48]  Jeremy M. Brown Predictive approaches to assessing the fit of evolutionary models. , 2014, Systematic biology.

[49]  Sebastián Duchêne,et al.  Molecular‐clock methods for estimating evolutionary rates and timescales , 2014, Molecular ecology.

[50]  P. Dutton,et al.  Marine turtle mitogenome phylogenetics and evolution. , 2012, Molecular phylogenetics and evolution.

[51]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[52]  Michael A. Thomas,et al.  Complete mitochondrial genome phylogeographic analysis of killer whales (Orcinus orca) indicates multiple species. , 2010, Genome research.

[53]  James E. Byers,et al.  MODEL SELECTION IN PHYLOGENETICS , 2005 .

[54]  M. Suchard,et al.  Bayesian random local clocks, or one rate to rule them all , 2010, BMC Biology.

[55]  D. Bryant,et al.  A general comparison of relaxed molecular clock models. , 2007, Molecular biology and evolution.

[56]  Cosma Rohilla Shalizi,et al.  Philosophy and the practice of Bayesian statistics. , 2010, The British journal of mathematical and statistical psychology.

[57]  J. Huelsenbeck,et al.  Bayesian phylogenetic analysis of combined data. , 2004, Systematic biology.

[58]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[59]  Walter R. Gilks,et al.  Model checking and model improvement , 1995 .

[60]  R. Nielsen Mapping mutations on phylogenies. , 2002, Systematic biology.

[61]  S. Ho,et al.  Accounting for calibration uncertainty in phylogenetic estimation of evolutionary divergence times. , 2009, Systematic biology.

[62]  Sudhir Kumar,et al.  Molecular clocks: four decades of evolution , 2005, Nature Reviews Genetics.