Exploring New Search Algorithms and Hardware for Phylogenetics: RAxML Meets the IBM Cell

Phylogenetic inference is considered to be one of the grand challenges in Bioinformatics due to the immense computational requirements. RAxML is currently among the fastest and most accurate programs for phylogenetic tree inference under the Maximum Likelihood (ML) criterion. First, we introduce new tree search heuristics that accelerate RAxML by a factor of 2.43 while returning equally good trees. The performance of the new search algorithm has been assessed on 18 real-world datasets comprising 148 up to 4,843 DNA sequences. We then present the implementation, optimization, and evaluation of RAxML on the IBM Cell Broadband Engine. We address the problems and provide solutions pertaining to the optimization of floating point code, control flow, communication, and scheduling of multi-level parallelism on the Cell.

[1]  Thomas Ludwig,et al.  Parallel Inference of a 10.000-Taxon Phylogeny with Maximum Likelihood , 2004, Euro-Par.

[2]  Derrick J. Zwickl Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion , 2006 .

[3]  David A. Bader,et al.  Industrial applications of high-performance computing for phylogeny reconstruction , 2001, SPIE ITCom.

[4]  Alexandros Stamatakis,et al.  A Nuclear Ribosomal DNA Phylogeny of Acer Inferred with Maximum Likelihood, Splits Graphs, and Motif Analysis of 606 Sequences , 2006 .

[5]  John R Spear,et al.  Phylogenetic diversity and ecology of environmental Archaea. , 2005, Current opinion in microbiology.

[6]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[7]  Arndt von Haeseler,et al.  pIQPNNI: parallel reconstruction of large maximum likelihood phylogenies , 2005, Bioinform..

[8]  S. Asano,et al.  The design and implementation of a first-generation CELL processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[9]  Alexandros Stamatakis,et al.  Phylogenetic models of rate heterogeneity: a high performance computing perspective , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[10]  William Kahan,et al.  Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic , 1996 .

[11]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[12]  Donald K. Berry,et al.  Parallel Implementation and Performance of FastDNAml - A Program for Maximum Likelihood Phylogenetic Inference , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[13]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[14]  Michael Gschwind,et al.  Optimizing Compiler for the CELL Processor , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[15]  Rosa M. Badia,et al.  CellSs: a Programming Model for the Cell BE Architecture , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[16]  Tamir Tuller,et al.  Maximum likelihood of evolutionary trees: hardness and approximation , 2005, ISMB.

[17]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[18]  I. Wald,et al.  Ray Tracing on the Cell Processor , 2006, 2006 IEEE Symposium on Interactive Ray Tracing.

[19]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[20]  Pedro Trancoso,et al.  Initial Experiences Porting a Bioinformatics Application to a Graphics Processor , 2005, Panhellenic Conference on Informatics.

[21]  Thomas Ludwig,et al.  RAxML-OMP: An Efficient Program for Phylogenetic Inference on SMPs , 2005, PaCT.

[22]  William J. Dally,et al.  Sequoia: Programming the Memory Hierarchy , 2006, International Conference on Software Composition.

[23]  Feng Lin,et al.  Reconstruction of large phylogenetic trees: A parallel approach , 2005, Comput. Biol. Chem..

[24]  Alexandros Stamatakis,et al.  Dynamic multigrain parallelization on the cell broadband engine , 2007, PPoPP.

[25]  Thomas Ludwig,et al.  RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees , 2005, Bioinform..

[26]  Scott R. Miller,et al.  Unexpected Diversity and Complexity of the Guerrero Negro Hypersaline Microbial Mat , 2006, Applied and Environmental Microbiology.

[27]  Alexandros Stamatakis,et al.  Distributed and parallel algorithms and systems for inference of huge phylogenetic trees based on the maximum likelihood method , 2004 .

[28]  Arndt von Haeseler,et al.  Large Maximum Likelihood Trees , 2006 .