Parallel Divide-and-Conquer Phylogeny Reconstruction by Maximum Likelihood

Phylogenetic trees are important in biology since their applications range from determining protein function to understanding the evolution of species. Maximum Likelihood (ML) is a popular optimization criterion in phylogenetics. However, inference of phylogenies with ML is NP-hard. Recursive-Iterative-DCM3 (Rec-I-DCM3) is a divide-and-conquer framework that divides a dataset into smaller subsets (subproblems), applies an external base method to infer subtrees, merges the subtrees into a comprehensive tree, and then refines the global tree with an external global method. In this study we present a novel parallel implementation of Rec-I-DCM3 for inference of large trees with ML. Parallel-Rec-I-DCM3 uses RAxML as external base and global search method. We evaluate program performance on 6 large real-data alignments containing 500 up to 7.769 sequences. Our experiments show that P-Rec-I-DCM3 reduces inference times and improves final tree quality over sequential Rec-I-DCM3 and stand-alone RAxML.

[1]  R. Sokal,et al.  A METHOD FOR DEDUCING BRANCHING SEQUENCES IN PHYLOGENY , 1965 .

[2]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[3]  C. Sander,et al.  Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? , 1994, Protein engineering.

[4]  M. Donoghue,et al.  Analyzing large data sets: rbcL 500 revisited. , 1997, Systematic biology.

[5]  Daniel H. Huson,et al.  Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction , 1999, J. Comput. Biol..

[6]  James R. Cole,et al.  The RDP (Ribosomal Database Project) continues , 2000, Nucleic Acids Res..

[7]  Eric A. Wernert,et al.  Parallel implementation and performance of fastDNAml: a program for maximum likelihood phylogenetic inference , 2001, SC.

[8]  Tandy J. Warnow,et al.  Sequence-Length Requirements for Phylogenetic Methods , 2002, WABI.

[9]  Matthew J. Brauer,et al.  Genetic algorithms and parallel processing in maximum-likelihood phylogeny inference. , 2002, Molecular biology and evolution.

[10]  Yves Van de Peer,et al.  The European database on small subunit ribosomal RNA , 2002, Nucleic Acids Res..

[11]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[12]  Bernard M. E. Moret,et al.  Rec-I-DCM3: a fast algorithmic technique for reconstructing phylogenetic trees , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[13]  Tandy J. Warnow,et al.  Rec-I-DCM3: A Fast Algorithmic Technique for Reconstructing Large Phylogenetic Trees , 2004, IEEE Computer Society Computational Systems Bioinformatics Conference.

[14]  Arndt von Haeseler,et al.  PhyNav: A Novel Approach to Reconstruct Large Phylogenies , 2004, GfKl.

[15]  J. Farris,et al.  Simultaneous parsimony jackknife analysis of 2538rbcL DNA sequences reveals support for major clades of green plants, land plants, seed plants and flowering plants , 1998, Plant Systematics and Evolution.

[16]  Thomas Ludwig,et al.  A fast program for maximum likelihood-based inference of large phylogenetic trees , 2004, SAC '04.

[17]  Thomas Ludwig,et al.  Parallel Inference of a 10.000-Taxon Phylogeny with Maximum Likelihood , 2004, Euro-Par.

[18]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[19]  John M. Mellor-Crummey,et al.  PRec-I-DCM3: A Parallel Framework for Fast and Accurate Large Scale Phylogeny Reconstruction , 2005, ICPADS.

[20]  Brian T. Sutch,et al.  Predicting protein functional sites with phylogenetic motifs , 2004, Proteins.

[21]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[22]  Thomas M. Keane,et al.  DPRml: distributed phylogeny reconstruction by maximum likelihood , 2005, Bioinform..

[23]  Tamir Tuller,et al.  Maximum Likelihood of Evolutionary Trees Is Hard , 2005, RECOMB.

[24]  Thomas Ludwig,et al.  RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees , 2005, Bioinform..

[25]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.