FPGA acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods

BackgroundLikelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA's on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors.ResultsWe use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10× speedup relative to software running on a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture.ConclusionsHeterogeneous computing, which combines general-purpose processors with special-purpose co-processors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power consumption as compared to many-core processors and Graphics Processor Units (GPUs) [1].

[1]  Alexandros Stamatakis,et al.  A reconfigurable architecture for the Phylogenetic Likelihood Function , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[2]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[3]  M. Berbee,et al.  The phylogeny of plant and animal pathogens in the Ascomycota , 2001 .

[4]  Tsuyoshi Hamada,et al.  PGR: a software package for reconfigurable super-computing , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[5]  N. Pace,et al.  Perspectives on archaeal diversity, thermophily and monophyly from environmental rRNA sequences. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Marc A. Suchard,et al.  Many-core algorithms for statistical phylogenetics , 2009, Bioinform..

[7]  Manfred Binder,et al.  Derivation of a polymorphic lineage of Gasteromycetes from boletoid ancestors , 2002, Mycologia.

[8]  Alexandros Stamatakis,et al.  Exploring FPGAs for accelerating the phylogenetic likelihood function , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[9]  Viktor K. Prasanna,et al.  High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs , 2007, IEEE Transactions on Parallel and Distributed Systems.

[10]  Thomas Ludwig,et al.  RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees , 2005, Bioinform..

[11]  Joseph Felsenstein,et al.  The number of evolutionary trees , 1978 .

[12]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[13]  Khalid H. Abed,et al.  CMOS VLSI Implementation of a Low-Power Logarithmic Converter , 2003, IEEE Trans. Computers.

[14]  Sandhya Dwarkadas,et al.  Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference , 2002, Bioinform..

[15]  Christophe Bobda,et al.  Optimizing Logarithmic Arithmetic on FPGAs , 2007, 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2007).

[16]  David Hewitt,et al.  A five-gene phylogeny of Pezizomycotina. , 2006, Mycologia.

[17]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[18]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[19]  Thomas M. Keane,et al.  DPRml: distributed phylogeny reconstruction by maximum likelihood , 2005, Bioinform..

[20]  F. Lutzoni,et al.  Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov chain Monte Carlo sampling and bootstrapping in assessing phylogenetic confidence. , 2003, Molecular biology and evolution.

[21]  Florent de Dinechin,et al.  Generating high-performance custom floating-point pipelines , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[22]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[23]  Terrence S. T. Mak,et al.  Embedded computation of maximum-likelihood phylogeny inference using platform FPGA , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[24]  Frank E. Anderson,et al.  Bilaterian Phylogeny Based on Analyses of a Region of the Sodium–Potassium ATPase β-Subunit Gene , 2004, Journal of Molecular Evolution.

[25]  Xizhou Feng,et al.  Parallel algorithms for Bayesian phylogenetic inference , 2003, J. Parallel Distributed Comput..

[26]  Martin Vingron,et al.  TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing , 2002, Bioinform..

[27]  Wayne Luk,et al.  Optimizing Logarithmic Arithmetic on FPGAs , 2007 .

[28]  Arndt von Haeseler,et al.  pIQPNNI: parallel reconstruction of large maximum likelihood phylogenies , 2005, Bioinform..

[29]  Xizhou Feng,et al.  Building the Tree of Life on Terascale Systems , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[30]  Derrick J. Zwickl Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion , 2006 .

[31]  Ren-Cang Li,et al.  Near optimality of Chebyshev interpolation for elementary function computations , 2004, IEEE Transactions on Computers.

[32]  Robert Bauer,et al.  Classicula: the teleomorph of Naiadella fluitans1 , 2003, Mycologia.

[33]  Luay Nakhleh,et al.  PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships , 2008, BMC Bioinformatics.