Estimating the Effective Sample Size of Tree Topologies from Bayesian Phylogenetic Analyses

Bayesian phylogenetic analyses estimate posterior distributions of phylogenetic tree topologies and other parameters using Markov chain Monte Carlo (MCMC) methods. Before making inferences from these distributions, it is important to assess their adequacy. To this end, the effective sample size (ESS) estimates how many truly independent samples of a given parameter the output of the MCMC represents. The ESS of a parameter is frequently much lower than the number of samples taken from the MCMC because sequential samples from the chain can be non-independent due to autocorrelation. Typically, phylogeneticists use a rule of thumb that the ESS of all parameters should be greater than 200. However, we have no method to calculate an ESS of tree topology samples, despite the fact that the tree topology is often the parameter of primary interest and is almost always central to the estimation of other parameters. That is, we lack a method to determine whether we have adequately sampled one of the most important parameters in our analyses. In this study, we address this problem by developing methods to estimate the ESS for tree topologies. We combine these methods with two new diagnostic plots for assessing posterior samples of tree topologies, and compare their performance on simulated and empirical data sets. Combined, the methods we present provide new ways to assess the mixing and convergence of phylogenetic tree topologies in Bayesian MCMC analyses.

[1]  James C. Wilgenbusch,et al.  AWTY (are we there yet?): a system for graphical exploration of MCMC convergence in Bayesian phylogenetics , 2008, Bioinform..

[2]  Remco R. Bouckaert,et al.  DensiTree: making sense of sets of phylogenetic trees , 2010, Bioinform..

[3]  D. P. Scantlebury Diversification rates have declined in the Malagasy herpetofauna , 2013, Proceedings of the Royal Society B: Biological Sciences.

[4]  A. Rambaut,et al.  BEAST: Bayesian evolutionary analysis by sampling trees , 2007, BMC Evolutionary Biology.

[5]  Chris Whidden,et al.  Quantifying MCMC Exploration of Phylogenetic Tree Space , 2014, Systematic biology.

[6]  Bradley P. Carlin,et al.  Markov Chain Monte Carlo in Practice: A Roundtable Discussion , 1998 .

[7]  D. Hillis,et al.  Analysis and visualization of tree space. , 2005, Systematic biology.

[8]  Trevor Bedford,et al.  Viral Phylodynamics , 2013, PLoS Comput. Biol..

[9]  A. Lemmon,et al.  Anchored hybrid enrichment for massively high-throughput phylogenomics. , 2012, Systematic biology.

[10]  Norbert Zeh,et al.  Fast FPT Algorithms for Computing Rooted Agreement Forests: Theory and Experiments , 2010, SEA.

[11]  Jean-Paul Chilbs,et al.  Geostatistics , 2000, Technometrics.

[12]  M. Steel,et al.  Distributions of Tree Comparison Metrics—Some New Results , 1993 .

[13]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[14]  Alexei J. Drummond,et al.  Bayesian Phylogeography Finds Its Roots , 2009, PLoS Comput. Biol..

[15]  Travis C Glenn,et al.  Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales. , 2012, Systematic biology.

[16]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[17]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[18]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[19]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[20]  Klaus Peter Schliep,et al.  phangorn: phylogenetic analysis in R , 2010, Bioinform..

[21]  J. Huelsenbeck,et al.  Potential applications and pitfalls of Bayesian inference of phylogeny. , 2002, Systematic biology.

[22]  M. Pagel,et al.  Bayesian estimation of ancestral character states on phylogenies. , 2004, Systematic biology.

[23]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .

[24]  Alexandros Stamatakis,et al.  ExaBayes: Massively Parallel Bayesian Tree Inference for the Whole-Genome Era , 2014, Molecular biology and evolution.

[25]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[26]  Alexandros Stamatakis,et al.  An Efficient Independence Sampler for Updating Branches in Bayesian Markov chain Monte Carlo Sampling of Phylogenetic Trees , 2015, Systematic biology.

[27]  Mary K Kuhner,et al.  Coalescent genealogy samplers: windows into population history. , 2009, Trends in ecology & evolution.

[28]  Xin He,et al.  On computing the nearest neighbor interchange distance , 1999, Discrete Mathematical Problems with Medical Applications.

[29]  Tao Jiang,et al.  On the Complexity of Comparing Evolutionary Trees , 1996, Discret. Appl. Math..