Calculating SPR distances between trees

The SPR distance between two trees is the minimum number of SPR moves required to convert one tree into the other. It has been proven as an NP‐complete problem. A heuristic to calculate SPR distances between trees is described. It performs favorably when compared with other existing heuristics, RIATA‐HGT and EEEP. Compared with RIATA‐HGT, the new method tends to produce better estimations when the trees are relatively similar, and worse estimations when the trees are very different (e.g., random trees); it produces results rather similar to those of EEEP, but orders of magnitude faster. A measure of tree‐similarity based on SPR distances is proposed, obtained by calculating the minimum number of weighted SPR moves (with moves to closer nodes being less costly). The resulting measure of similarity is symmetric (i.e., Dij = Dji, for any two trees i,j).

[1]  P. Goloboff Analyzing Large Data Sets in Reasonable Times: Solutions for Composite Optima , 1999, Cladistics : the international journal of the Willi Hennig Society.

[2]  Nicholas Hamilton,et al.  Phylogenetic identification of lateral genetic transfer events , 2006, BMC Evolutionary Biology.

[3]  Kevin C. Nixon,et al.  The limits of conventional cladistic analysis , 2006 .

[4]  Charles Semple,et al.  On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance , 2005 .

[5]  W. Maddison Gene Trees in Species Trees , 1997 .

[6]  B. Baum Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees , 1992 .

[7]  P. Goloboff METHODS FOR FASTER PARSIMONY ANALYSIS , 1996 .

[8]  J. Farris HENNIG DEFINED PARAPHYLY , 1991, Cladistics : the international journal of the Willi Hennig Society.

[9]  G. Giribet,et al.  TNT: Tree Analysis Using New Technology , 2005 .

[10]  B. Dasgupta,et al.  On distances between phylogenetic trees , 1997, SODA '97.

[11]  J. Farris Methods for Computing Wagner Trees , 1970 .

[12]  Pablo A. Goloboff,et al.  A revision of the South American spiders of the family Nemesiidae (Araneae, Mygalomorphae). Part 1, Species from Peru, Chile, Argentina, and Uruguay. Bulletin of the AMNH ; no. 224 , 1995 .

[13]  M. Ragan Phylogenetic inference based on matrix representation of trees. , 1992, Molecular phylogenetics and evolution.

[14]  J. Farris On Comparing the Shapes of Taxonomic Trees , 1973 .

[15]  Norman A. Slade,et al.  Cladistic Analysis of Restriction Endonuclease Cleavage Maps Within a Maximum-Likelihood Framework , 1985 .

[16]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[17]  Maria Luisa Bonet,et al.  Approximating Subtree Distances Between Phylogenies , 2006, J. Comput. Biol..

[18]  Charles Semple,et al.  Hybrids in real time. , 2006, Systematic biology.

[19]  W. H. Day,et al.  A computationally efficient approximation to the nearest neighbor interchange metric , 1984 .

[20]  Pablo A. Goloboff,et al.  CHARACTER OPTIMIZATION AND CALCULATION OF TREE LENGTHS , 1993 .

[21]  Daniel R. Brooks,et al.  Hennig's Parasitological Method: A Proposed Solution , 1981 .

[22]  P. Goloboff,et al.  Continuous characters analyzed as such , 2006 .

[23]  Luay Nakhleh,et al.  RIATA-HGT: A Fast and Accurate Heuristic for Reconstructing Horizontal Gene Transfer , 2005, COCOON.

[24]  K. Nixon,et al.  The Parsimony Ratchet, a New Method for Rapid Parsimony Analysis , 1999, Cladistics : the international journal of the Willi Hennig Society.