P3: Phylogenetic Posterior Prediction in RevBayes

Abstract Tests of absolute model fit are crucial in model‐based inference because poorly structured models can lead to biased parameter estimates. In Bayesian inference, posterior predictive simulations can be used to test absolute model fit. However, such tests have not been commonly practiced in phylogenetic inference due to a lack of convenient and flexible software. Here, we describe our newly implemented tests of model fit using posterior predictive testing, based on both data‐ and inference‐based test statistics, in the phylogenetics software RevBayes. This new implementation makes a large spectrum of models available for use through a user‐friendly and flexible interface.

[1]  Jeremy M. Brown,et al.  Detection of implausible phylogenetic inferences using posterior predictive assessment of model fit. , 2014, Systematic biology.

[2]  C. Geyer Markov Chain Monte Carlo Maximum Likelihood , 1991 .

[3]  Xiao-Li Meng,et al.  POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES , 1996 .

[4]  Frédéric Delsuc,et al.  Less is more in mammalian phylogenomics: AT-rich genes minimize tree conflicts and unravel the root of placental mammals. , 2013, Molecular biology and evolution.

[5]  John P. Huelsenbeck,et al.  Probabilistic Graphical Model Representation in Phylogenetics , 2013, Systematic biology.

[6]  B. Rannala,et al.  Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models. , 2004, Systematic biology.

[7]  Jonathan P. Bollback,et al.  Bayesian model adequacy and choice in phylogenetics. , 2002, Molecular biology and evolution.

[8]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[9]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[10]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[11]  Jeremy M. Brown Predictive approaches to assessing the fit of evolutionary models. , 2014, Systematic biology.

[12]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[13]  B. Rannala,et al.  Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference , 1996, Journal of Molecular Evolution.

[14]  S. Ho,et al.  New Statistical Criteria Detect Phylogenetic Bias Caused by Compositional Heterogeneity , 2017, Molecular biology and evolution.

[15]  Jeremy M. Brown,et al.  Can We Identify Genes with Increased Phylogenetic Reliability? , 2015, Systematic biology.

[16]  Jeremy M. Brown,et al.  PuMA: Bayesian analysis of partitioned (and unpartitioned) model adequacy , 2009, Bioinform..

[17]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[18]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[19]  John P. Huelsenbeck,et al.  Parallel power posterior analyses for fast computation of marginal likelihoods in phylogenetics , 2017, bioRxiv.

[20]  Nick Goldman,et al.  Statistical tests of models of DNA substitution , 1993, Journal of Molecular Evolution.

[21]  Jonathan P. Bollback,et al.  Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology , 2001, Science.

[22]  Jack Sullivan,et al.  Model Selection in Phylogenetics , 2005 .

[23]  Sandhya Dwarkadas,et al.  Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference , 2002, Bioinform..

[24]  Michael J. Landis,et al.  RevBayes: Bayesian Phylogenetic Inference Using Graphical Models and an Interactive Model-Specification Language , 2016, Systematic biology.

[25]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[26]  Ramón Doallo,et al.  CircadiOmics: integrating circadian genomics, transcriptomics, proteomics and metabolomics , 2012, Nature Methods.

[27]  Ivan Matic,et al.  Reanalysis of phosphoproteomics data uncovers ADP-ribosylation sites , 2012, Nature Methods.

[28]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[29]  K. Crandall,et al.  Selecting the best-fit model of nucleotide substitution. , 2001, Systematic biology.