Approximating Subtree Distances Between Phylogenies

We give a 5-approximation algorithm to the rooted Subtree-Prune-and-Regraft (rSPR) distance between two phylogenies, which was recently shown to be NP-complete. This paper presents the first approximation result for this important tree distance. The algorithm follows a standard format for tree distances. The novel ideas are in the analysis. In the analysis, the cost of the algorithm uses a "cascading" scheme that accounts for possible wrong moves. This accounting is missing from previous analysis of tree distance approximation algorithms. Further, we show how all algorithms of this type can be implemented in linear time and give experimental results.

[1]  R. Gray,et al.  Language-tree divergence times support the Anatolian theory of Indo-European origin , 2003, Nature.

[2]  V. Moulton,et al.  Bounding the Number of Hybridisation Events for a Consistent Evolutionary History , 2005, Journal of mathematical biology.

[3]  M. Steel,et al.  Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees , 2001 .

[4]  D. Hillis,et al.  Predictive Evolution , 1999, Science.

[5]  Yoshiko Wakabayashi,et al.  Some Approximation Results for the Maximum Agreement Forest Problem , 2001, RANDOM-APPROX.

[6]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[7]  J. Huelsenbeck,et al.  MRBAYES : Bayesian inference of phylogeny , 2001 .

[8]  Tandy J. Warnow,et al.  Reconstructing reticulate evolution in species: theory and practice , 2004, RECOMB.

[9]  W. Fitch,et al.  Predicting the evolution of human influenza A. , 1999, Science.

[10]  D. Hillis,et al.  Analysis and visualization of tree space. , 2005, Systematic biology.

[11]  Dan Gusfield,et al.  An Overview of Combinatorial Methods for Haplotype Inference , 2002, Computational Methods for SNPs and Haplotype Inference.

[12]  R. Page,et al.  Trees within trees: phylogeny and historical associations. , 1998, Trends in ecology & evolution.

[13]  Nina Amenta,et al.  Case study: visualizing sets of evolutionary trees , 2002, IEEE Symposium on Information Visualization, 2002. INFOVIS 2002..

[14]  Charles Semple,et al.  Computing the minimum number of hybridization events for a consistent evolutionary history , 2007, Discret. Appl. Math..

[15]  Charles Semple,et al.  On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance , 2005 .

[16]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[17]  Sébastien Roch,et al.  A short proof that phylogenetic tree reconstruction by maximum likelihood is hard , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[19]  F. Vannberg,et al.  Human Immunodeficiency Virus Type 1 Subtype C Molecular Phylogeny: Consensus Sequence for an AIDS Vaccine Design? , 2002, Journal of Virology.

[20]  Tandy J. Warnow,et al.  Reconstructing the evolutionary history of natural languages , 1996, SODA '96.

[21]  Tandy J. Warnow,et al.  Reconstructing Reticulate Evolution in SpeciesTheory and Practice , 2005, J. Comput. Biol..

[22]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[23]  Dan Gusfield,et al.  A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem , 2005, RECOMB.

[24]  Alessandro Panconesi,et al.  Ancestral Maximum Likelihood of Evolutionary Trees Is Hard , 2003, WABI.

[25]  Tao Jiang,et al.  On the Complexity of Comparing Evolutionary Trees , 1996, Discret. Appl. Math..

[26]  W. H. Day Optimal algorithms for comparing trees with labeled leaves , 1985 .

[27]  Serdar Tasiran,et al.  TreeJuxtaposer: scalable tree comparison using Focus+Context with guaranteed visibility , 2003, ACM Trans. Graph..

[28]  F. Ayala Molecular systematics , 2004, Journal of Molecular Evolution.

[29]  R. Graham,et al.  The steiner problem in phylogeny is NP-complete , 1982 .

[30]  M. P. Cummings,et al.  PAUP* Phylogenetic analysis using parsimony (*and other methods) Version 4 , 2000 .