Improved bootstrap confidence limits in large-scale phylogenies, with an example from Neo-Astragalus (Leguminosae).

Phylogenetic analyses of large data sets pose special challenges, including the apparent tendency for the bootstrap support for a clade to decline with increased taxon sampling of that clade. We document this decline in data sets with increasing numbers of taxa in Astragalus, the most species-rich angiosperm genus. Support for one subclade, Neo-Astragalus, declined monotonically with increased sampling of taxa inside Neo-Astragalus, irrespective of whether parsimony or neighbor-joining methods were used or of which particular heuristic search algorithm was used (although more stringent algorithms tended to yield higher support). Three possible explanations for this decline were examined, including (1) mistaken assignment of the most recent common ancestor of the taxon sample (and its bootstrap support) with the most recent common ancestor of the clade from which it was sampled; (2) computational limitations of heuristic search strategies; and (3) statistical bias in bootstrap proportions, especially that from random homoplasy distributed among taxa. The best explanation appears to be (3), although computational shortcomings (2) may explain some of the problem. The bootstrap proportion, as currently used in phylogenetic analysis, does not accurately capture the classical notion of confidence assessments on the null hypothesis of nonmonophyly, especially in large data sets. More accurate assessments of confidence as type I error levels (relying on iterated bootstrap methods) remove most of the monotonic decline in confidence with increasing numbers of taxa.

[1]  H Philippe,et al.  Species sampling has a major impact on phylogenetic inference. , 1993, Molecular phylogenetics and evolution.

[2]  B. Efron,et al.  Bootstrap confidence levels for phylogenetic trees. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[3]  S. Hedges The number of replications needed for accurate estimation of the bootstrap P value in phylogenetic studies. , 1992, Molecular biology and evolution.

[4]  M. Sanderson,et al.  Evidence on the monophyly of Astragalus (Fabaceae) and its major subgroups based on nuclear ribosomal DNA ITS and chloroplast DNA trnL intron data , 1999 .

[5]  B. Efron Better Bootstrap Confidence Intervals , 1987 .

[6]  A. Rodrigo,et al.  Calibrating the bootstrap test of monophyly. , 1993, International journal for parasitology.

[7]  Michael A. Newton,et al.  Bootstrapping phylogenies: Large deviations and dispersion effects , 1996 .

[8]  M. Donoghue,et al.  Analyzing large data sets: rbcL 500 revisited. , 1997, Systematic biology.

[9]  J. Bull,et al.  An Empirical Test of Bootstrapping as a Method for Assessing Confidence in Phylogenetic Analysis , 1993 .

[10]  M. Sanderson,et al.  Phylogenetic relationships in North American Astragalus (Fabaceae) based on chloroplast DNA restriction site variation , 1993 .

[11]  D. Soltis,et al.  Inferring complex phylogenies using parsimony: an empirical approach using three large DNA data sets for angiosperms. , 1998, Systematic biology.

[12]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[13]  A. Zharkikh,et al.  Estimation of confidence in phylogeny: the complete-and-partial bootstrap technique. , 1995, Molecular phylogenetics and evolution.

[14]  J. M. Lock,et al.  Legumes of West Asia : a check list , 2000 .

[15]  Michael J. Sanderson,et al.  MONOPHYLY OF ANEUPLOID ASTRAGALUS (FABACEAE): EVIDENCE FROM NUCLEAR RIBOSOMAL DNA INTERNAL TRANSCRIBED SPACER SEQUENCES , 1993 .

[16]  R. Tibshirani,et al.  The problem of regions , 1998 .

[17]  Joseph Felsenstein,et al.  Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull , 1993 .

[18]  R. Graham,et al.  Unlikelihood that minimal phylogenies for a realistic biological study can be constructed in reasonable computational time , 1982 .

[19]  D. Swofford,et al.  Taxon sampling revisited , 1999, Nature.

[20]  D. Maddison,et al.  NEXUS: an extensible file format for systematic information. , 1997, Systematic biology.

[21]  S. Poe Sensitivity of phylogeny estimation to taxonomic sampling. , 1998, Systematic biology.