论文信息 - Provably Fast and Accurate Recovery of Evolutionary Trees through Harmonic Greedy Triplets

Provably Fast and Accurate Recovery of Evolutionary Trees through Harmonic Greedy Triplets

We give a greedy learning algorithm for reconstructing an evolutionary tree based on a certain harmonic average on triplets of terminal taxa. After the pairwise distances between terminal taxa are estimated from sequence data, the algorithm runs in $\smallbigO{\numtaxa^2}$ time using $\smallbigO{\numtaxa}$ work space, where $\numtaxa$ is the number of terminal taxa. These time and space complexities are optimal in the sense that the size of an input distance matrix is $\numtaxa^2$ and the size of an output tree is $\numtaxa$. Moreover, in the Jukes--Cantor model of evolution, the algorithm recovers the correct tree topology with high probability using sample sequences of length polynomial in (1) $\numtaxa$, (2) the logarithm of the error probability, and (3) the inverses of two small parameters.

Ming-Yang Kao | Miklós Csürös | M. Kao | Miklós Csürös

[1] M. Siddall,et al. Success of Parsimony in the Four‐Taxon Case: Long‐Branch Repulsion by Likelihood in the Farris Zone , 1998 .

[2] M. Steel,et al. A Few Logs Suuce to Build Almost All Trees Ii , 1997 .

[3] M. Kearns. On the Learnability of Discrete Distributions Extended Abstract , 1994 .

[4] Sampath Kannan,et al. Efficient algorithms for inverting evolution , 1999, JACM.

[5] Mikkel Thorup,et al. On the approximability of numerical taxonomy (fitting distances by tree metrics) , 1996, SODA '96.

[6] Q. Feng,et al. On better heuristic for Euclidean Steiner minimum trees , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[7] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[8] Andris Ambainis,et al. Nearly tight bounds on the learnability of evolution , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[9] W. H. Day. Computational complexity of inferring phylogenies from dissimilarity matrices. , 1987, Bulletin of mathematical biology.

[10] Joseph Felsenstein,et al. Statistical inference of phylogenies , 1983 .

[11] Sampath KannanyNovember. Eecient Algorithms for Inverting Evolution , 1995 .

[12] Ming-Yang Kao,et al. Recovering evolutionary trees through harmonic greedy triplets , 1999, SODA '99.

[13] Paul W. Goldberg,et al. Evolutionary Trees Can be Learned in Polynomial Time in the Two-State General Markov Model , 2001, SIAM J. Comput..

[14] Daniel H. Huson,et al. Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction , 1999, J. Comput. Biol..

[15] J. Felsenstein. Numerical Methods for Inferring Evolutionary Trees , 1982, The Quarterly Review of Biology.

[16] Kevin Atteson,et al. The Performance of Neighbor-Joining Algorithms of Phylogeny Recronstruction , 1997, COCOON.

[17] M. Steel. Recovering a tree from the leaf colourations it generates under a Markov model , 1994 .

[18] Tandy J. Warnow,et al. A few logs suffice to build (almost) all trees (I) , 1999, Random Struct. Algorithms.

[19] P. Erdös,et al. A few logs suffice to build (almost) all trees (l): part I , 1997 .

[20] Dan Gusfield. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[21] David S. Johnson,et al. The computational complexity of inferring rooted phylogenies by parsimony , 1986 .