Bootstrapping phylogenies: Large deviations and dispersion effects

SUMMARY A large deviation result is established for the bootstrap empirical distribution in a finite sample space, thereby validating both nonparametric and parametric bootstrapping in certain phylogenetic inference problems. The bias previously observed in the bootstrap distribution of the estimated tree topology is shown to stem from dispersion effects in the joint distribution of sample and bootstrap empirical distributions. Both results are examined for maximum likelihood estimation in a three-taxon model having particularly simple geometry. They are also applicable to discrete parameter problems outside of phylogenetic inference.

[1]  J. Bull,et al.  An Empirical Test of Bootstrapping as a Method for Assessing Confidence in Phylogenetic Analysis , 1993 .

[2]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[3]  J. Gärtner On Large Deviations from the Invariant Measure , 1977 .

[4]  Srinivasa R. S. Varadhan,et al.  Asymptotic probabilities and differential equations , 1966 .

[5]  Joseph Felsenstein,et al.  The number of evolutionary trees , 1978 .

[6]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[7]  J. Hartigan,et al.  Statistical Analysis of Hominoid Molecular Evolution , 1987 .

[8]  Joseph Felsenstein,et al.  Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull , 1993 .

[9]  Joseph Felsenstein,et al.  Statistical inference of phylogenies , 1983 .

[10]  James A. Bucklew,et al.  Large Deviation Techniques in Decision, Simulation, and Estimation , 1990 .

[11]  R. Ellis,et al.  LARGE DEVIATIONS FOR A GENERAL-CLASS OF RANDOM VECTORS , 1984 .

[12]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[13]  I. N. Sanov On the probability of large deviations of random variables , 1958 .

[14]  E. Mammen The Bootstrap and Edgeworth Expansion , 1997 .

[15]  A. Zharkikh,et al.  Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. I. Four taxa with a molecular clock. , 1992, Molecular biology and evolution.

[16]  A. Zharkikh,et al.  Estimation of confidence in phylogeny: the complete-and-partial bootstrap technique. , 1995, Molecular phylogenetics and evolution.

[17]  M. Miyamoto,et al.  Phylogenetic Analysis of DNA Sequences , 1991 .

[18]  P. Hall The Bootstrap and Edgeworth Expansion , 1992 .

[19]  A. von Haeseler,et al.  Phylogenetic inference: linear invariants and maximum likelihood. , 1993, Biometrics.

[20]  Terence P. Speed,et al.  Invariants of Some Probability Models Used in Phylogenetic Inference , 1993 .

[21]  P. Bertail,et al.  The Weighted Bootstrap , 1995 .

[22]  Amir Dembo,et al.  Large Deviations Techniques and Applications , 1998 .

[23]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[24]  R. Ellis,et al.  Entropy, large deviations, and statistical mechanics , 1985 .

[25]  M. Nei Molecular Evolutionary Genetics , 1987 .

[26]  Joseph Felsenstein,et al.  PHYLOGENIES FROM RESTRICTION SITES: A MAXIMUM‐LIKELIHOOD APPROACH , 1992, Evolution; international journal of organic evolution.