On divide-and-conquer strategies for parsimony analysis of large data sets: Rec-I-DCM3 versus TNT.

Roshan et al. recently described a "divide-and-conquer" technique for parsimony analysis of large data sets, Rec-I-DCM3, and stated that it compares very favorably to results using the program TNT. Their technique is based on selecting subsets of taxa to create reduced data sets or subproblems, finding most-parsimonious trees for each reduced data set, recombining all parts together, and then performing global TBR swapping on the combined tree. Here, we contrast this approach to sectorial searches, a divide-and-conquer algorithm implemented in TNT. This algorithm also uses a guide tree to create subproblems, with the first-pass state sets of the nodes that join the selected sectors with the rest of the topology; this allows exact length calculations for the entire topology (that is, any solution N steps shorter than the original, for the reduced subproblem, must also be N steps shorter for the entire topology). We show here that, for sectors of similar size analyzed with the same search algorithms, subdividing data sets with sectorial searches produces better results than subdividing with Rec-I-DCM3. Roshan et al.'s claim that Rec-I-DCM3 outperforms the techniques in TNT was caused by a poor experimental design and algorithmic settings used for the runs in TNT. In particular, for finding trees at or very close to the minimum known length of the analyzed data sets, TNT clearly outperforms Rec-I-DCM3. Finally, we show that the performance of Rec-I-DCM3 is bound by the efficiency of TBR implementation for the complete data set, as this method behaves (after some number of iterations) as a technique for cyclic perturbations and improvements more than as a divide-and-conquer strategy.

[1]  R. Meier,et al.  Software Review , 2005 .

[2]  John M. Mellor-Crummey,et al.  PRec-I-DCM3: A Parallel Framework for Fast and Accurate Large Scale Phylogeny Reconstruction , 2005, 11th International Conference on Parallel and Distributed Systems (ICPADS'05).

[3]  Gonzalo Giribet,et al.  Techniques in Molecular Systematics and Evolution , 2002, Methods and Tools in Biosciences and Medicine.

[4]  James F. Smith Phylogenetics of seed plants : An analysis of nucleotide sequences from the plastid gene rbcL , 1993 .

[5]  James S Farris,et al.  The full-length phylogenetic tree from 1551 ribosomal sequences of chitinous fungi, Fungi. , 2003, Mycological research.

[6]  D. Hillis Inferring complex phytogenies , 1996, Nature.

[7]  Tandy Warnow,et al.  Algorithmic techniques for improving the speed and accuracy of phylogenetic methods , 2004 .

[8]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[9]  G. Giribet,et al.  TNT: Tree Analysis Using New Technology , 2005 .

[10]  J. Farris Methods for Computing Wagner Trees , 1970 .

[11]  P. Goloboff Analyzing Large Data Sets in Reasonable Times: Solutions for Composite Optima , 1999, Cladistics : the international journal of the Willi Hennig Society.

[12]  Daniel H. Huson,et al.  Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction , 1999, J. Comput. Biol..

[13]  J. Farris,et al.  Simultaneous parsimony jackknife analysis of 2538rbcL DNA sequences reveals support for major clades of green plants, land plants, seed plants and flowering plants , 1998, Plant Systematics and Evolution.

[14]  Albert Y. Zomaya,et al.  High‐Performance Phylogeny Reconstruction Under Maximum Parsimony , 2006 .

[15]  Albert Y. Zomaya Parallel Computing for Bioinformatics and Computational Biology , 2005 .

[16]  Feng Lin,et al.  Reconstruction of large phylogenetic trees: A parallel approach , 2005, Comput. Biol. Chem..

[17]  Marc Smith,et al.  Cooperative Rec-I-DCM3: A Population-Based Approach for Reconstructing Phylogenies , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[18]  K. Nixon The Parsimony Ratchet, a New Method for Rapid Parsimony Analysis , 1999 .

[19]  W. Wheeler Alignment, dynamic homology, and optimization , 2006 .

[20]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[21]  Daniel H. Huson,et al.  Solving Large Scale Phylogenetic Problems using DCM2 , 1999, ISMB.

[22]  R Henrik Nilsson,et al.  Automated phylogenetic taxonomy: an example in the homobasidiomycetes (mushroom-forming fungi). , 2005, Systematic biology.

[23]  Joel Cracraft,et al.  Assembling the tree of life , 2004 .

[24]  Bernard M. E. Moret,et al.  Rec-I-DCM3: a fast algorithmic technique for reconstructing phylogenetic trees , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[25]  M. Sanderson,et al.  Phylogenetic supermatrix analysis of GenBank sequences from 2228 papilionoid legumes. , 2006, Systematic biology.

[26]  D. Soltis,et al.  Inferring complex phylogenies using parsimony: an empirical approach using three large DNA data sets for angiosperms. , 1998, Systematic biology.

[27]  Victor A. Albert,et al.  Parsimony, phylogeny, and genomics , 2006 .

[28]  Marc L. Smith,et al.  Phylospaces: reconstructing evolutionary trees in tuple space , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[29]  Alexandros Stamatakis,et al.  Parallel Divide-and-Conquer Phylogeny Reconstruction by Maximum Likelihood , 2005, HPCC.

[30]  D. Swofford PAUP*: Phylogenetic analysis using parsimony (*and other methods), Version 4.0b10 , 2002 .