Distance-based phylogenetic algorithms: New insights and applications

Phylogenetic methods have recently been rediscovered in several interesting areas among which immunodynamics, epidemiology and many branches of evolutionary dynamics. In many interesting cases the reconstruction of a correct phylogeny is blurred by high mutation rates and/or horizontal transfer events. As a consequence, a divergence arises between the true evolutionary distances and the distances between pairs of taxa as inferred from the available data, making the phylogenetic reconstruction a challenging problem. Mathematically this divergence translates in the non-additivity of the actual distances between taxa and the quest for new algorithms able to efficiently cope with these effects is wide open. In distance-based reconstruction methods, two properties of additive distances were extensively exploited as antagonist criteria to drive phylogeny reconstruction: on the one hand a local property of quartets, i.e. sets of four taxa in a tree, the four-point condition; on the other hand, a recently proposed formula that allows to write the tree length as a function of the distances between taxa, the Pauplin's formula. A deeper comprehension of the effects of the non-additivity on the inspiring principles of the existing reconstruction algorithms is thus of paramount importance. In this paper we present a comparative analysis of the performances of the most important distance-based phylogenetic algorithms. We focus in particular on the dependence of their performances on two main sources of non-additivity: back-mutation processes and horizontal transfer processes. The comparison is carried out in the framework of a set of generative algorithms for phylogenies that incorporate non-additivity in a tunable way.

[1]  R. Gray,et al.  Language-tree divergence times support the Anatolian theory of Indo-European origin , 2003, Nature.

[2]  Jacqueline A. Servin,et al.  Decoding the genomic tree of life , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Serita M. Nelesen,et al.  Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees , 2009, Science.

[4]  Susanna C. Manrubia,et al.  Topological properties of phylogenetic trees in evolutionary models , 2009 .

[5]  Kevin Atteson,et al.  The Performance of Neighbor-Joining Algorithms of Phylogeny Recronstruction , 1997, COCOON.

[6]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[7]  Albert Y. Zomaya,et al.  On a New Quartet-Based Phylogeny Reconstruction Algorithm , 2006, BIOCOMP.

[8]  Satish Rao,et al.  Short Quartet Puzzling: A New Quartet-Based Phylogeny Reconstruction Algorithm , 2008, J. Comput. Biol..

[9]  M. Pagel Human language as a culturally transmitted replicator , 2009, Nature Reviews Genetics.

[10]  Lior Pachter,et al.  Why Neighbor-Joining Works , 2006, Algorithmica.

[11]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[12]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[13]  Y. Pauplin Direct Calculation of a Tree Length Using a Distance Matrix , 2000, Journal of Molecular Evolution.

[14]  O. Pybus,et al.  Unifying the Epidemiological and Evolutionary Dynamics of Pathogens , 2004, Science.

[15]  J. Avise Phylogeography: The History and Formation of Species , 2000 .

[16]  Andrew Rambaut,et al.  Evolutionary analysis of the dynamics of viral infectious disease , 2009, Nature Reviews Genetics.

[17]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[18]  Olivier Gascuel,et al.  Fast and Accurate Phylogeny Reconstruction Algorithms Based on the Minimum-Evolution Principle , 2002, J. Comput. Biol..