MrBayes for Phylogenetic Inference Using Protein Data on a GPU Cluster

MrBayes is a widely used software for Bayesian phylogenetic inference: we input biological sequence data from various taxonomic groups, and MrBayes returns its estimate of the phylogenetic tree which gave rise to those taxa. This paper presents ta(MC)\(^{3}\), based on its predecessor a(MC)\(^{3}\), which, for protein datasets, improves computational efficiency and overcomes major obstacles in analyzing larger datasets on HPCs with multiple Graphics Processing Units (GPUs). The major improvements are (a) a new task mapping strategy, (b) the use of Kahan summation to resolve non-convergence issues, and (c) the introduction of 64-bit variables. We evaluate ta(MC)\(^{3}\) on real-world protein datasets both on a desktop server and the Tianhe-1A supercomputer. With a single GPU, ta(MC)\(^{3}\) is nearly 90 times faster compared with the serial version of MrBayes, up to around 9 times faster than MrBayes utilizing a GPU via the BEAGLE library, and up to 2.5 times faster than a(MC)\(^{3}\). On larger datasets with 64 nodes (GPUs) on Tianhe-1A, ta(MC)\(^{3}\) is capable of obtaining \(1000+\) speedup vs. serial MrBayes.

[1]  Jack J. Purdum,et al.  C programming guide , 1983 .

[2]  Martin Vingron,et al.  TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing , 2002, Bioinform..

[3]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[4]  William Kahan,et al.  Pracniques: further remarks on reducing truncation errors , 1965, CACM.

[5]  Pedro Trancoso,et al.  Fine-grain Parallelism Using Multi-core, Cell/BE, and GPU Systems: Accelerating the Phylogenetic Likelihood Function , 2009, 2009 International Conference on Parallel Processing.

[6]  Qiang Xie,et al.  The Bayesian phylogenetic analysis of the 18S rRNA sequences from the main lineages of Trichophora (Insecta: Heteroptera: Pentatomomorpha). , 2005, Molecular phylogenetics and evolution.

[7]  B. Rannala,et al.  Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference , 1996, Journal of Molecular Evolution.

[8]  Ziheng Yang Phylogenetic analysis using parsimony and likelihood methods , 1996, Journal of Molecular Evolution.

[9]  Hani Doss,et al.  Phylogenetic Tree Construction Using Markov Chain Monte Carlo , 2000 .

[10]  Xizhou Feng,et al.  Parallel algorithms for Bayesian phylogenetic inference , 2003, J. Parallel Distributed Comput..

[11]  Rebecca J. Stones,et al.  GPU MrBayes V3.1: MrBayes on Graphics Processing Units for Protein Sequence Data. , 2015, Molecular biology and evolution.

[12]  Sandhya Dwarkadas,et al.  Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference , 2002, Bioinform..

[13]  Gang Wang,et al.  MrBayes on a Graphics Processing Unit , 2011, Bioinform..

[14]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[15]  Robert M. Farber,et al.  CUDA Application Design and Development , 2011 .

[16]  B. Larget,et al.  Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees , 2000 .

[17]  Xiaoguang Liu,et al.  Efficient Implementation of MrBayes on Multi-GPU , 2013, Molecular biology and evolution.

[18]  M. Newton,et al.  Phylogenetic Inference for Binary Data on Dendograms Using Markov Chain Monte Carlo , 1997 .