Inferring evolutionary trees with strong combinatorial evidence

We consider the problem of inferring the evolutionary tree of a set of n species. We propose a quartet reconstruction method which specifically produces trees whose edges have strong combinatorial evidence. Let Q be a set of resolved quartets defined on the studied species, the method computes the unique maximum subset Q∗ of Q which is equivalent to a tree and outputs the corresponding tree as an estimate of the species’ phylogeny. We use a characterization of the subset Q∗ due to Bandelt and Dress (Adv. Appl. Math. 7 (1986) 309–343) to provide an O(n4) incremental algorithm for this variant of the NP-hard quartet consistency problem. Moreover, when chosing the resolution of the quartets by the four-point method (FPM) and considering the Cavender–Farris model of evolution, we show that the convergence rate of the Q∗ method is at worst polynomial when the maximum evolutive distance between two species is bounded. We complete these theoretical results by an experimental study on real and simulated data sets. The results show that (i) as expected, the strong combinatorial constraints it imposes on each edge leads the Q∗ method to propose very few incorrect edges; (ii) more surprisingly; the method infers trees with a relatively high degree of resolution.

[1]  O. Gascuel,et al.  A reduction algorithm for approximating a (nonmetric) dissimilarity by a tree distance , 1996 .

[2]  A. Dress,et al.  A canonical decomposition theory for metrics on a finite set , 1992 .

[3]  A. Tversky,et al.  Additive similarity trees , 1977 .

[4]  J. Felsenstein Phylogenies from molecular sequences: inference and reliability. , 1988, Annual review of genetics.

[5]  Kevin Atteson,et al.  The Performance of Neighbor-Joining Algorithms of Phylogeny Recronstruction , 1997, COCOON.

[6]  K. Strimmer,et al.  Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies , 1996 .

[7]  P. Buneman A Note on the Metric Properties of Trees , 1974 .

[8]  C. Gissi,et al.  The guinea-pig is not a rodent , 1996, Nature.

[9]  David Sankoff,et al.  COMPUTATIONAL COMPLEXITY OF INFERRING PHYLOGENIES BY COMPATIBILITY , 1986 .

[10]  Dan Gusfield,et al.  Efficient algorithms for inferring evolutionary trees , 1991, Networks.

[11]  E. Harding The probabilities of rooted tree-shapes generated by random bifurcation , 1971, Advances in Applied Probability.

[12]  Dan Graur,et al.  Is the guinea-pig a rodent? , 1991, Nature.

[13]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[14]  Olivier Gascuel,et al.  On the Interpretation of Bootstrap Trees: Appropriate Threshold of Clade Selection and Induced Gain , 1996 .

[15]  Mikkel Thorup,et al.  On the approximability of numerical taxonomy (fitting distances by tree metrics) , 1996, SODA '96.

[16]  D. Aldous PROBABILITY DISTRIBUTIONS ON CLADOGRAMS , 1996 .

[17]  J. Huelsenbeck,et al.  SUCCESS OF PHYLOGENETIC METHODS IN THE FOUR-TAXON CASE , 1993 .

[18]  P. Buneman The Recovery of Trees from Measures of Dissimilarity , 1971 .

[19]  M. Steel The complexity of reconstructing trees from qualitative characters and subtrees , 1992 .

[20]  Vincent Berry,et al.  Faster reliable phylogenetic analysis , 1999, RECOMB.

[21]  Sudhir Kumar,et al.  A stepwise algorithm for finding minimum evolution trees. , 1996, Molecular biology and evolution.

[22]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[23]  A. Dress,et al.  Reconstructing the shape of a tree from observed dissimilarity data , 1986 .

[24]  Sampath Kannan,et al.  Efficient algorithms for inverting evolution , 1999, JACM.

[25]  M. Miyamoto,et al.  Phylogenetic Analysis of DNA Sequences , 1991 .

[26]  Vincent Moulton,et al.  A polynomial time algorithm for constructing the refined Buneman tree , 1999 .

[27]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[28]  J. Farris A Probability Model for Inferring Evolutionary Trees , 1973 .

[29]  M. Nei,et al.  Estimation of evolutionary distance between nucleotide sequences. , 1984, Molecular biology and evolution.

[30]  H. Colonius,et al.  Tree structures for proximity data , 1981 .

[31]  V. Chepoi,et al.  l ∞ -approximation via subdominants , 2000 .

[32]  W. H. Day Computational complexity of inferring phylogenies from dissimilarity matrices. , 1987, Bulletin of mathematical biology.

[33]  Tandy J. Warnow,et al.  Constructing Big Trees from Short Sequences , 1997, ICALP.

[34]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[35]  Mike A. Steel,et al.  Retractions of Finite Distance Functions Onto Tree Metrics , 1999, Discret. Appl. Math..

[36]  Christopher A. Meacham,et al.  A MANUAL METHOD FOR CHARACTER COMPATIBILITY ANALYSIS , 1981 .

[37]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[38]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[39]  J. A. Cavender Taxonomy with confidence , 1978 .

[40]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[41]  Andris Ambainis,et al.  Nearly tight bounds on the learnability of evolution , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[42]  H Philippe,et al.  Species sampling has a major impact on phylogenetic inference. , 1993, Molecular phylogenetics and evolution.

[43]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.