Choosing the tree which actually best explains the data: another look at the bootstrap in phylogenetic reconstruction

We consider the problem of phylogenetic reconstruction, which consists in estimating the evolutionary history of a set of species. This unknown history is modelled as a tree and estimated from nucleotide sequences taken from the species’ genome. The rst goal of the estimation is to produce a tree which is structurally as close as possible to the true tree. However, most phylogenetic tree-building methods rely on optimization criteria which lead to infering fully resolved trees, i.e. models of maximal complexity. Thus, such trees usually contain some wrong edges, too specic to the data, i.e., resulting from an overtting eect. We rst introduce a structural goodness-of-t criterion based on quartets of species. Then we describe a tree-building method inferring a fully resolved tree by optimizing this criterion. We present two descending approaches to remove unreliable edges from this tree. The rst one relies on the bootstrap process (Efron, 1979) as introduced in the phylogenetic eld by Felsenstein (1985). The second one is original in this context but analogous to usual methods in model calibration. Simulations show the eciency of both approaches, in that the structural distance between the true tree and the estimated tree is signicantly reduced. c 2000 Elsevier Science B.V. All rights reserved.

[1]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[2]  M. Nei,et al.  Four-cluster analysis: a simple method to test phylogenetic hypotheses. , 1995, Molecular biology and evolution.

[3]  Vincent Berry Méthodes et algorithmes pour reconstruire les arbres de l'Evolution , 1997 .

[4]  M. Nei,et al.  The neighbor-joining method , 1987 .

[5]  O Gascuel,et al.  BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. , 1997, Molecular biology and evolution.

[6]  Olivier Gascuel,et al.  On the Interpretation of Bootstrap Trees: Appropriate Threshold of Clade Selection and Induced Gain , 1996 .

[7]  E Lizabethhalloran Bradleyefron Bootstrap confidence levels for phylogenetic trees , 1996 .

[8]  M. Gouy,et al.  Statistical tests of molecular phylogenies. , 1990, Methods in enzymology.

[9]  A. Dress,et al.  Reconstructing the shape of a tree from observed dissimilarity data , 1986 .

[10]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[11]  M. Steel The complexity of reconstructing trees from qualitative characters and subtrees , 1992 .

[12]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[13]  O. Gascuel,et al.  A reduction algorithm for approximating a (nonmetric) dissimilarity by a tree distance , 1996 .

[14]  STATISTICAL TESTS OF DNA PHYLOGENIES , 1995 .

[15]  B. Efron,et al.  Bootstrap confidence levels for phylogenetic trees. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[16]  P. Buneman A Note on the Metric Properties of Trees , 1974 .

[17]  A. Zharkikh,et al.  Estimation of confidence in phylogeny: the complete-and-partial bootstrap technique. , 1995, Molecular phylogenetics and evolution.

[18]  ScienceDirect Computational statistics & data analysis , 1983 .

[19]  A. Tversky,et al.  Additive similarity trees , 1977 .

[20]  李幼升,et al.  Ph , 1989 .

[21]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[22]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[23]  B. Larget,et al.  Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees , 2000 .

[24]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[25]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[26]  J. Bull,et al.  An Empirical Test of Bootstrapping as a Method for Assessing Confidence in Phylogenetic Analysis , 1993 .