Efficient Implementation of MrBayes on Multi-GPU

MrBayes, using Metropolis-coupled Markov chain Monte Carlo (MCMCMC or (MC)3), is a popular program for Bayesian inference. As a leading method of using DNA data to infer phylogeny, the (MC)3 Bayesian algorithm and its improved and parallel versions are now not fast enough for biologists to analyze massive real-world DNA data. Recently, graphics processor unit (GPU) has shown its power as a coprocessor (or rather, an accelerator) in many fields. This article describes an efficient implementation a(MC)3 (aMCMCMC) for MrBayes (MC)3 on compute unified device architecture. By dynamically adjusting the task granularity to adapt to input data size and hardware configuration, it makes full use of GPU cores with different data sets. An adaptive method is also developed to split and combine DNA sequences to make full use of a large number of GPU cards. Furthermore, a new “node-by-node” task scheduling strategy is developed to improve concurrency, and several optimizing methods are used to reduce extra overhead. Experimental results show that a(MC)3 achieves up to 63× speedup over serial MrBayes on a single machine with one GPU card, and up to 170× speedup with four GPU cards, and up to 478× speedup with a 32-node GPU cluster. a(MC)3 is dramatically faster than all the previous (MC)3 algorithms and scales well to large GPU clusters.

[1]  D. E. Soltis,et al.  Angiosperm phylogeny: 17 genes, 640 taxa. , 2011, American journal of botany.

[2]  Wilfried Thuiller,et al.  Consequences of climate change on the tree of life in Europe , 2011, Nature.

[3]  Sandhya Dwarkadas,et al.  Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference , 2002, Bioinform..

[4]  B. Rannala,et al.  Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference , 1996, Journal of Molecular Evolution.

[5]  P. Lewis,et al.  A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. , 1998, Molecular biology and evolution.

[6]  M A Newton,et al.  Bayesian Phylogenetic Inference via Markov Chain Monte Carlo Methods , 1999, Biometrics.

[7]  Gang Wang,et al.  A New Hybrid Parallel Algorithm for MrBayes , 2010, ICA3PP.

[8]  Martin Vingron,et al.  TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing , 2002, Bioinform..

[9]  Pedro Trancoso,et al.  Fine-grain Parallelism Using Multi-core, Cell/BE, and GPU Systems: Accelerating the Phylogenetic Likelihood Function , 2009, 2009 International Conference on Parallel Processing.

[10]  Andreas Resios GPU performance prediction using parametrized models , 2011 .

[11]  Markus Friedrich,et al.  Episodic radiations in the fly tree of life , 2011, Proceedings of the National Academy of Sciences.

[12]  Hideo Matsuda,et al.  fastDNAmL: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood , 1994, Comput. Appl. Biosci..

[13]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[14]  Gang Wang,et al.  MrBayes on a Graphics Processing Unit , 2011, Bioinform..

[15]  J. Shultz,et al.  Arthropod relationships revealed by phylogenomic analysis of nuclear protein-coding sequences , 2010, Nature.

[16]  Xizhou Feng,et al.  Parallel algorithms for Bayesian phylogenetic inference , 2003, J. Parallel Distributed Comput..

[17]  X. Feng,et al.  PBPI: a High Performance Implementation of Bayesian Phylogenetic Inference , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[18]  Jonathan P. Bollback,et al.  Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology , 2001, Science.

[19]  Hani Doss,et al.  Phylogenetic Tree Construction Using Markov Chain Monte Carlo , 2000 .

[20]  B. Larget,et al.  Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees , 2000 .

[21]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[22]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[23]  M. Newton,et al.  Phylogenetic Inference for Binary Data on Dendograms Using Markov Chain Monte Carlo , 1997 .

[24]  Qiang Xie,et al.  The Bayesian phylogenetic analysis of the 18S rRNA sequences from the main lineages of Trichophora (Insecta: Heteroptera: Pentatomomorpha). , 2005, Molecular phylogenetics and evolution.

[25]  Qiang Xie,et al.  18S rRNA hyper-elongation and the phylogeny of Euhemiptera (Insecta: Hemiptera). , 2008, Molecular phylogenetics and evolution.

[26]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[27]  B. Rannala,et al.  Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. , 1997, Molecular biology and evolution.