Phylogenetic Analysis of Large Sequence Data Sets

Phylogenetic analysis is an integral part of biological research. As the number of sequenced genomes increases, available data sets are growing in number and size. Several algorithms have been proposed to handle these larger data sets. A family of algorithms known as disc covering methods (DCMs), have been selected by the NSF funded CIPRes project to boost the performance of existing phylogenetic algorithms. Recursive Iterative Disc Covering Method 3 (Rec-I-DCM3), recursively decomposes the guide tree into subtrees, executing a phylogenetic search on the subtree and merging the subtrees, for a set number of iterations. This paper presents a detailed analysis of this algorithm.

[1]  P. Goloboff Analyzing Large Data Sets in Reasonable Times: Solutions for Composite Optima , 1999, Cladistics : the international journal of the Willi Hennig Society.

[2]  D. Swofford PAUP*: Phylogenetic analysis using parsimony (*and other methods), Version 4.0b10 , 2002 .

[3]  G. Giribet,et al.  TNT: Tree Analysis Using New Technology , 2005 .

[4]  C. Sing,et al.  Application of cladistics to the analysis of genotype-phenotype relationships , 1992, European Journal of Epidemiology.

[5]  K. Crandall,et al.  Multiple interspecies transmissions of human and simian T-cell leukemia/lymphoma virus type I sequences. , 1996, Molecular biology and evolution.

[6]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[7]  Tandy Warnow,et al.  Algorithmic techniques for improving the speed and accuracy of phylogenetic methods , 2004 .

[8]  Daniel H. Huson,et al.  Solving Large Scale Phylogenetic Problems using DCM2 , 1999, ISMB.

[9]  Tandy J. Warnow,et al.  Rec-I-DCM3: A Fast Algorithmic Technique for Reconstructing Large Phylogenetic Trees , 2004, IEEE Computer Society Computational Systems Bioinformatics Conference.

[10]  Bernard M. E. Moret,et al.  Rec-I-DCM3: a fast algorithmic technique for reconstructing phylogenetic trees , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[11]  K. Nixon,et al.  The Parsimony Ratchet, a New Method for Rapid Parsimony Analysis , 1999, Cladistics : the international journal of the Willi Hennig Society.

[12]  E. Boerwinkle,et al.  Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. , 1998, American journal of human genetics.

[13]  Daniel H. Huson,et al.  Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction , 1999, J. Comput. Biol..

[14]  F. Ayala Molecular systematics , 2004, Journal of Molecular Evolution.

[15]  James R. Cole,et al.  The RDP (Ribosomal Database Project) continues , 2000, Nucleic Acids Res..