A few logs suffice to build (almost) all trees (l): part I

Inferring evolutionary trees is an interesting and important problem in biology that is very difficult from a computational point of view as most associated optimization problems are NP-hard. Although it is known that many methods are provably statistically consistent (i.e. the probability of recovering the correct tree converges on 1 as the sequence length increases), the actual rate of convergence for different methods has not been well understood. In a recent paper we introduced a new method for reconstructing evolutionary trees called the Dyadic Closure Method (DCM), and we showed that DCM has a very fast convergence rate. DCM runs in O(n^5 log n) time, where n is the number of sequences, so although it is polynomial it has computational requirements that are potentially too large to be of use in practice. In this paper we present another tree reconstruction method, the Witness-Antiwitness Method, or WAM. WAM is significantly faster than DCM, especially on random trees, and converges at the same rate as DCM. We also compare WAM to other methods used to reconstruct trees, including Neighbor Joining (possibly the most popular method among molecular biologists), and new methods introduced in the computer science literature.

[1]  Ye.A Smolenskii A method for the linear recording of graphs , 1963 .

[2]  J. Neyman MOLECULAR STUDIES OF EVOLUTION: A SOURCE OF NOVEL STATISTICAL PROBLEMS* , 1971 .

[3]  E. Harding The probabilities of rooted tree-shapes generated by random bifurcation , 1971, Advances in Applied Probability.

[4]  P. Buneman The Recovery of Trees from Measures of Dissimilarity , 1971 .

[5]  J. Farris A Probability Model for Inferring Evolutionary Trees , 1973 .

[6]  J. A. Cavender Taxonomy with confidence , 1978 .

[7]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[8]  M. Kimura Estimation of evolutionary distances between homologous nucleotide sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[9]  H. Colonius,et al.  Tree structures for proximity data , 1981 .

[10]  David Penny,et al.  Comparing Trees with Pendant Vertices Labelled , 1984 .

[11]  David Sankoff,et al.  COMPUTATIONAL COMPLEXITY OF INFERRING PHYLOGENIES BY COMPATIBILITY , 1986 .

[12]  A. Dress,et al.  Reconstructing the shape of a tree from observed dissimilarity data , 1986 .

[13]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[14]  N. Saitou,et al.  Relative Efficiencies of the Fitch-Margoliash, Maximum-Parsimony, Maximum-Likelihood, Minimum-Evolution, and Neighbor-joining Methods of Phylogenetic Tree Construction in Obtaining the Correct Tree , 1989 .

[15]  M. Hendy The Relationship Between Simple Evolutionary Tree Models and Observable Sequence Data , 1989 .

[16]  Piotr Rudnicki,et al.  A Fast Algorithm for Constructing Trees from Distance Matrices , 1989, Inf. Process. Lett..

[17]  M. Marcus,et al.  A Survey of Matrix Theory and Matrix Inequalities , 1965 .

[18]  Nicholas C. Wormald,et al.  On the Distribution of Lengths of Evolutionary Trees , 1990, SIAM J. Discret. Math..

[19]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[20]  M. Steel The complexity of reconstructing trees from qualitative characters and subtrees , 1992 .

[21]  Joseph T. Chang,et al.  Reconstruction of Evolutionary Trees from Pairwise Distributions on Current Species , 1992 .

[22]  T. Warnow Combinatorial algorithms for constructing phylogenetic trees , 1992 .

[23]  J. Huelsenbeck,et al.  SUCCESS OF PHYLOGENETIC METHODS IN THE FOUR-TAXON CASE , 1993 .

[24]  Andrey A. Zharkikh,et al.  Inconsistency of the Maximum-parsimony Method: the Case of Five Taxa With a Molecular Clock , 1993 .

[25]  László A. Székely,et al.  Reconstructing Trees When Sequence Sites Evolve at Variable Rates , 1994, J. Comput. Biol..

[26]  J. Huelsenbeck,et al.  Hobgoblin of phylogenetics? , 1994, Nature.

[27]  James K. M. Brown Probabilities of Evolutionary Trees , 1994 .

[28]  M. Steel Recovering a tree from the leaf colourations it generates under a Markov model , 1994 .

[29]  D. Hillis Approaches for Assessing Phylogenetic Accuracy , 1995 .

[30]  M. Steel,et al.  Extension Operations on Sets of Leaf-Labeled Trees , 1995 .

[31]  J. Huelsenbeck Performance of Phylogenetic Methods in Simulation , 1995 .

[32]  Sampath Kannan,et al.  Efficient algorithms for inverting evolution , 1996, STOC '96.

[33]  D. Aldous PROBABILITY DISTRIBUTIONS ON CLADOGRAMS , 1996 .

[34]  László A. Székely,et al.  The number of nucleotide sites needed to accurately reconstructlarge evolutionary trees , 1996 .

[35]  D. Hillis Inferring complex phytogenies , 1996, Nature.

[36]  Mikkel Thorup,et al.  On the approximability of numerical taxonomy (fitting distances by tree metrics) , 1996, SODA '96.

[37]  K. Strimmer,et al.  Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies , 1996 .

[38]  P. Erdös,et al.  Local Quartet Splits of a Binary Tree Infer All Quartet Splits Via One Dyadic Inference Rule , 1996, Comput. Artif. Intell..

[39]  Kevin Atteson,et al.  The Performance of Neighbor-Joining Algorithms of Phylogeny Recronstruction , 1997, COCOON.

[40]  Andris Ambainis,et al.  Nearly tight bounds on the learnability of evolution , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[41]  Tandy J. Warnow,et al.  Parsimony is Hard to Beat , 1997, COCOON.

[42]  K. Strimmer,et al.  Bayesian Probabilities and Quartet Puzzling , 1997 .

[43]  Olivier Gascuel,et al.  Inferring evolutionary trees with strong combinatorial evidence , 1997, Theor. Comput. Sci..

[44]  S J Willson Measuring inconsistency in phylogenetic trees. , 1998, Journal of theoretical biology.