New Approaches to Compare Phylogenetic Search Heuristics

We present new and novel insights into the behavior of two maximum parsimony heuristics for building evolutionary trees of different sizes. First, our results show that the heuristics find different classes of good-scoring trees, where the different classes of trees may have significant evolutionary implications. Secondly, we develop a new entropy-based measure to quantify the diversity among the evolutionary trees found by the heuristics. Overall, topological distance measures such as the Robinson-Foulds distance identify more diversity among a collection of trees than parsimony scores, which implies more powerful heuristics could be designed that use a combination of parsimony scores and topological distances. Thus, by understanding phylogenetic heuristic behavior, better heuristics could be designed, which ultimately leads to more accurate evolutionary trees.

[1]  P. Goloboff Analyzing Large Data Sets in Reasonable Times: Solutions for Composite Optima , 1999, Cladistics : the international journal of the Willi Hennig Society.

[2]  David P. Mindell,et al.  Molecular evidence of HIV-1 transmission in a criminal case , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Daniel H. Huson,et al.  Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction , 1999, J. Comput. Biol..

[4]  R. Gutell,et al.  Assessing the odd secondary structural properties of nuclear small subunit ribosomal RNA sequences (18S) of the twisted‐wing parasites (Insecta: Strepsiptera) , 2005, Insect molecular biology.

[5]  Seung-Jin Sul,et al.  A Randomized Algorithm for Comparing Sets of Phylogenetic Trees , 2007, APBC.

[6]  Tandy J. Warnow,et al.  Designing fast converging phylogenetic methods , 2001, ISMB.

[7]  D. Robinson,et al.  Comparison of weighted labelled trees , 1979 .

[8]  Joseph J Gillespie,et al.  An evaluation of ensign wasp classification (Hymenoptera: Evaniidae) based on molecular data and insights from ribosomal RNA secondary structure , 2006 .

[9]  Daniel H. Huson,et al.  Solving Large Scale Phylogenetic Problems using DCM2 , 1999, ISMB.

[10]  Tandy J. Warnow,et al.  Rec-I-DCM3: A Fast Algorithmic Technique for Reconstructing Large Phylogenetic Trees , 2004, IEEE Computer Society Computational Systems Bioinformatics Conference.

[11]  David A. Bader,et al.  Industrial applications of high-performance computing for phylogeny reconstruction , 2001, SPIE ITCom.

[12]  Bernard M. E. Moret,et al.  Rec-I-DCM3: a fast algorithmic technique for reconstructing phylogenetic trees , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[13]  K. Nixon,et al.  The Parsimony Ratchet, a New Method for Rapid Parsimony Analysis , 1999, Cladistics : the international journal of the Willi Hennig Society.

[14]  D. Ord,et al.  PAUP:Phylogenetic analysis using parsi-mony , 1993 .

[15]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[16]  M. Donoghue,et al.  Analyzing large data sets: rbcL 500 revisited. , 1997, Systematic biology.