Fast and reliable reconstruction of phylogenetic trees with very short edges

Phylogenetic reconstruction is the problem of reconstructing an evolutionary tree from sequences corresponding to leaves of that tree. A central goal in phylogenetic reconstruction is to be able to reconstruct the tree as accurately as possible from as short as possible input sequences. The sequence length required for correct topological reconstruction depends on certain properties of the tree, such as its depth and minimal edge-weight. Fast converging reconstruction algorithms are considered state-of the-art in this sense, as they require asymptotically minimal sequence length in order to guarantee (with high probability) correct topological reconstruction of the entire tree. However, when the original phylogenetic tree contains very short edges, this minimal sequence-length is still too long for practical purposes. Short edges are not only very hard to reconstruct; their presence may also prevent the correct reconstruction of long edges. In this paper we present a fast converging reconstruction algorithm which returns a partially resolved topology containing all edges of the original tree whose weight exceeds some (non-trivial) lower bound, which is determined by the input sequence length, as well as some properties of the tree, such as its depth. It does not depend, however, on the minimal edge-weight. This lower bound provides a partial reconstruction guarantee which is strictly stronger than the guarantees given by other fast converging algorithms. Our algorithm also has optimal complexity (linear space and quadratic-time) which, together with its partial reconstruction guarantee, makes it appealing for practical use.

[1]  Paul W. Goldberg,et al.  Evolutionary Trees Can be Learned in Polynomial Time in the Two-State General Markov Model , 2001, SIAM J. Comput..

[2]  S. Jeffery Evolution of Protein Molecules , 1979 .

[3]  Piotr Rudnicki,et al.  A Fast Algorithm for Constructing Trees from Distance Matrices , 1989, Inf. Process. Lett..

[4]  Elchanan Mossel Phase transitions in phylogeny , 2003, Transactions of the American Mathematical Society.

[5]  Tandy J. Warnow,et al.  A Few Logs Suffice to Build (almost) All Trees: Part II , 1999, Theor. Comput. Sci..

[6]  Shlomo Moran,et al.  Neighbor Joining Algorithms for Inferring Phylogenies via LCA Distances , 2007, J. Comput. Biol..

[7]  D. Penny Inferring Phylogenies.—Joseph Felsenstein. 2003. Sinauer Associates, Sunderland, Massachusetts. , 2004 .

[8]  Larry Wasserman,et al.  All of Statistics , 2004 .

[9]  Elchanan Mossel,et al.  Maximal Accurate Forests from Distance Matrices , 2006, RECOMB.

[10]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[11]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[12]  J. Farris A Probability Model for Inferring Evolutionary Trees , 1973 .

[13]  Ming-Yang Kao,et al.  Recovering evolutionary trees through harmonic greedy triplets , 1999, SODA '99.

[14]  Elchanan Mossel,et al.  Phylogenies without Branch Bounds: Contracting the Short, Pruning the Deep , 2008, SIAM J. Discret. Math..

[15]  P. Buneman The Recovery of Trees from Measures of Dissimilarity , 1971 .

[16]  Vincent Berry,et al.  Faster reliable phylogenetic analysis , 1999, RECOMB.

[17]  P. Erdös,et al.  A few logs suffice to build (almost) all trees (l): part I , 1997 .

[18]  Elchanan Mossel Distorted Metrics on Trees and Phylogenetic Forests , 2007, TCBB.

[19]  Li Zhang,et al.  On the complexity of distance-based evolutionary tree reconstruction , 2003, SODA '03.

[20]  Elchanan Mossel,et al.  Optimal phylogenetic reconstruction , 2005, STOC '06.

[21]  J. Neyman MOLECULAR STUDIES OF EVOLUTION: A SOURCE OF NOVEL STATISTICAL PROBLEMS* , 1971 .

[22]  Tandy J. Warnow,et al.  Absolute convergence: true trees from short sequences , 2001, SODA '01.

[23]  Kevin Atteson,et al.  The Performance of Neighbor-Joining Methods of Phylogenetic Reconstruction , 1999, Algorithmica.

[24]  László A. Székely,et al.  Inverting Random Functions II: Explicit Bounds for Discrete Maximum Likelihood Estimation, with Applications , 2002, SIAM J. Discret. Math..

[25]  Daniel H. Huson,et al.  Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction , 1999, J. Comput. Biol..

[26]  W. A. Beyer,et al.  Additive evolutionary trees. , 1977, Journal of theoretical biology.

[27]  Elchanan Mossel,et al.  How much can evolved characters tell us about the tree that generated them? , 2004, Mathematics of Evolution and Phylogeny.

[28]  J. A. Cavender Taxonomy with confidence , 1978 .

[29]  Miklós Csürös Fast recovery of evolutionary trees with thousands of nodes , 2001, RECOMB.

[30]  Tandy J. Warnow,et al.  A few logs suffice to build (almost) all trees (I) , 1999, Random Struct. Algorithms.