How Many Bootstrap Replicates Are Necessary?

Phylogenetic Bootstrapping (BS) is a standard technique for inferring confidence values on phylogenetic trees that is based on reconstructing many trees from minor variations of the input data, trees called replicates. BS is used with all phylogenetic reconstruction approaches, but we focus here on the most popular, Maximum Likelihood (ML). Because ML inference is so computationally demanding, it has proved too expensive to date to assess the impact of the number of replicates used in BS on the quality of the support values. For the same reason, a rather small number (typically 100) of BS replicates are computed in real-world studies. Stamatakis et al. recently introduced a BS algorithm that is 1---2 orders of magnitude faster than previous techniques, while yielding qualitatively comparable support values, making an experimental study possible. In this paper, we propose stopping criteria , that is, thresholds computed at runtime to determine when enough replicates have been generated, and report on the first large-scale experimental study to assess the effect of the number of replicates on the quality of support values, including the performance of our proposed criteria. We run our tests on 17 diverse real-world DNA, single-gene as well as multi-gene, datasets, that include between 125 and 2,554 sequences. We find that our stopping criteria typically stop computations after 100---500 replicates (although the most conservative criterion may continue for several thousand replicates) while producing support values that correlate at better than 99.5% with the reference values on the best ML trees. Significantly, we also find that the stopping criteria can recommend very different numbers of replicates for different datasets of comparable sizes. Our results are thus two-fold: (i) they give the first experimental assessment of the effect of the number of BS replicates on the quality of support values returned through bootstrapping; and (ii) they validate our proposals for stopping criteria. Practitioners will no longer have to enter a guess nor worry about the quality of support values; moreover, with most counts of replicates in the 100---500 range, robust BS under ML inference becomes computationally practical for most datasets. The complete test suite is available at http://lcbb.epfl.ch/BS.tar.bz2 and BS with our stopping criteria is included in RAxML 7.1.0.

[1]  Wenge Guo,et al.  Adaptive Choice of the Number of Bootstrap Samples in Large Scale Multiple Testing , 2008, Statistical applications in genetics and molecular biology.

[2]  Alexandros Stamatakis,et al.  Efficient computation of the phylogenetic likelihood function on multi-gene alignments and multi-core architectures , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[3]  Alexandros Stamatakis,et al.  Parallel computation of phylogenetic consensus trees , 2010, ICCS.

[4]  B. Manly Randomization, Bootstrap and Monte Carlo Methods in Biology , 2018 .

[5]  W. H. Day Optimal algorithms for comparing trees with labeled leaves , 1985 .

[6]  James C. Wilgenbusch,et al.  AWTY (are we there yet?): a system for graphical exploration of MCMC convergence in Bayesian phylogenetics , 2008, Bioinform..

[7]  Sébastien Roch,et al.  A short proof that phylogenetic tree reconstruction by maximum likelihood is hard , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[8]  J. Bull,et al.  An Empirical Test of Bootstrapping as a Method for Assessing Confidence in Phylogenetic Analysis , 1993 .

[9]  G. Olsen,et al.  Majority-Rule Consensus of Phylogenetic Trees Obtained by Maximum-Likelihood Analysis , 1997 .

[10]  Simon Whelan,et al.  New approaches to phylogenetic tree search and their application to large numbers of protein alignments. , 2007, Systematic biology.

[11]  Bernard M. E. Moret,et al.  DIMACS Series in Discrete Mathematics and Theoretical Computer Science Towards a Discipline of Experimental Algorithmics , 2022 .

[12]  Derrick J. Zwickl Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion , 2006 .

[13]  S. Hedges The number of replications needed for accurate estimation of the bootstrap P value in phylogenetic studies. , 1992, Molecular biology and evolution.

[14]  R. Thorne,et al.  Phenetic and Phylogenetic Classification , 1964, Nature.

[15]  Bernard M. E. Moret Large-scale phylogenetic reconstruction , 2007 .

[16]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[17]  D. Hillis,et al.  Analysis and visualization of tree space. , 2005, Systematic biology.

[18]  Alexandros Stamatakis,et al.  How Many Bootstrap Replicates Are Necessary? , 2009, RECOMB.

[19]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[20]  Donald W. K. Andrews,et al.  ON THE NUMBER OF BOOTSTRAP REPETITIONS FOR BCa CONFIDENCE INTERVALS , 2002, Econometric Theory.

[21]  D. Robinson,et al.  Comparison of weighted labelled trees , 1979 .

[22]  Donald W. K. Andrews,et al.  Evaluation of a three-step method for choosing the number of bootstrap repetitions , 2001 .

[23]  Douglas E. Soltis,et al.  Applying the Bootstrap in Phylogeny Reconstruction , 2003 .

[24]  Larry Carter,et al.  Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..

[25]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[26]  On the Number of Bootstrap Repetitions for Bootstrap Standard Errors, Confidence Intervals, and Tests , 1996 .

[27]  Douglas E. Soltis,et al.  A 567‐Taxon Data Set for Angiosperms: The Challenges Posed by Bayesian Analyses of Large Data Sets , 2007, International Journal of Plant Sciences.

[28]  P. Hall On the Number of Bootstrap Simulations Required to Construct a Confidence Interval , 1986 .

[29]  D. Andrews,et al.  A Three-Step Method for Choosing the Number of Bootstrap Repetitions , 2000 .

[30]  M. Martindale,et al.  Assessing the root of bilaterian animals with scalable phylogenomic methods , 2009, Proceedings of the Royal Society B: Biological Sciences.

[31]  J. Rougemont,et al.  A rapid bootstrap algorithm for the RAxML Web servers. , 2008, Systematic biology.

[32]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[33]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[34]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[35]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[36]  Thomas Ludwig,et al.  New fast and accurate heuristics for inference of large phylogenetic trees , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[37]  S. Holmes,et al.  Bootstrapping Phylogenetic Trees: Theory and Methods , 2003 .

[38]  Elchanan Mossel,et al.  Limitations of Markov chain Monte Carlo algorithms for Bayesian inference of phylogeny , 2005, The Annals of Applied Probability.

[39]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[40]  J. MacKinnon,et al.  Bootstrap tests: how many bootstraps? , 2000 .

[41]  Debashis Kushary,et al.  Bootstrap Methods and Their Application , 2000, Technometrics.

[42]  Nina Amenta,et al.  A Linear-Time Majority Tree Algorithm , 2003, WABI.

[43]  Bernard M. E. Moret,et al.  Efficiently Computing the Robinson-Foulds Metric , 2007, J. Comput. Biol..

[44]  R. Graham,et al.  The steiner problem in phylogeny is NP-complete , 1982 .

[45]  Seung-Jin Sul,et al.  Efficiently Computing Arbitrarily-Sized Robinson-Foulds Distance Matrices , 2008, WABI.

[46]  Seung-Jin Sul,et al.  A Randomized Algorithm for Comparing Sets of Phylogenetic Trees , 2007, APBC.

[47]  Alexandros Stamatakis,et al.  A Nuclear Ribosomal DNA Phylogeny of Acer Inferred with Maximum Likelihood, Splits Graphs, and Motif Analysis of 606 Sequences , 2006, Evolutionary bioinformatics online.

[48]  Anthony C. Davison,et al.  Bootstrap Methods and Their Application , 1998 .

[49]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .