Provably Fast and Accurate Recovery of Evolutionary Trees through Harmonic Greedy Triplets

We give a greedy learning algorithm for reconstructing an evolutionary tree based on a certain harmonic average on triplets of terminal taxa. After the pairwise distances between terminal taxa are estimated from sequence data, the algorithm runs in $\smallbigO{\numtaxa^2}$ time using $\smallbigO{\numtaxa}$ work space, where $\numtaxa$ is the number of terminal taxa. These time and space complexities are optimal in the sense that the size of an input distance matrix is $\numtaxa^2$ and the size of an output tree is $\numtaxa$. Moreover, in the Jukes--Cantor model of evolution, the algorithm recovers the correct tree topology with high probability using sample sequences of length polynomial in (1) $\numtaxa$, (2) the logarithm of the error probability, and (3) the inverses of two small parameters.

[1]  M. Siddall,et al.  Success of Parsimony in the Four‐Taxon Case: Long‐Branch Repulsion by Likelihood in the Farris Zone , 1998 .

[2]  M. Steel,et al.  A Few Logs Suuce to Build Almost All Trees Ii , 1997 .

[3]  M. Kearns On the Learnability of Discrete Distributions Extended Abstract , 1994 .

[4]  Sampath Kannan,et al.  Efficient algorithms for inverting evolution , 1999, JACM.

[5]  Mikkel Thorup,et al.  On the approximability of numerical taxonomy (fitting distances by tree metrics) , 1996, SODA '96.

[6]  Q. Feng,et al.  On better heuristic for Euclidean Steiner minimum trees , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[7]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[8]  Andris Ambainis,et al.  Nearly tight bounds on the learnability of evolution , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[9]  W. H. Day Computational complexity of inferring phylogenies from dissimilarity matrices. , 1987, Bulletin of mathematical biology.

[10]  Joseph Felsenstein,et al.  Statistical inference of phylogenies , 1983 .

[11]  Sampath KannanyNovember Eecient Algorithms for Inverting Evolution , 1995 .

[12]  Ming-Yang Kao,et al.  Recovering evolutionary trees through harmonic greedy triplets , 1999, SODA '99.

[13]  Paul W. Goldberg,et al.  Evolutionary Trees Can be Learned in Polynomial Time in the Two-State General Markov Model , 2001, SIAM J. Comput..

[14]  Daniel H. Huson,et al.  Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction , 1999, J. Comput. Biol..

[15]  J. Felsenstein Numerical Methods for Inferring Evolutionary Trees , 1982, The Quarterly Review of Biology.

[16]  Kevin Atteson,et al.  The Performance of Neighbor-Joining Algorithms of Phylogeny Recronstruction , 1997, COCOON.

[17]  M. Steel Recovering a tree from the leaf colourations it generates under a Markov model , 1994 .

[18]  Tandy J. Warnow,et al.  A few logs suffice to build (almost) all trees (I) , 1999, Random Struct. Algorithms.

[19]  P. Erdös,et al.  A few logs suffice to build (almost) all trees (l): part I , 1997 .

[20]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[21]  David S. Johnson,et al.  The computational complexity of inferring rooted phylogenies by parsimony , 1986 .