Optimizing the Bayesian Inference of Phylogeny on Graphic Processors

Searching for the evolutionary relationships between groups of organism has become a routine procedure in molecular biology. MrBayes is a popular model based phylogenetic inference tool using Bayesian statistics. Unfortunately, the computational cost is very high, resulting in undesirably long execution time. In this paper, we present what we believe the fastest solution of the MrBayes MC3 algorithm running on off-the-shelf graphic processors. The performance benefits are offered by the multi-granularity parallelism model, coarse-grained GPU kernel system, efficient thread arrangement strategy and GPU code level optimizations. MrBayes goMC3 (proposed herein) provides a significant performance improvement over the sequential MrBayes MC3 by a speedup of up to 48× when using single Tesla C2075 GPU card, whereas a speedup factor of 77× can be achieved when using dual GPUs. In comparison to the state-of-the-art version of other publicly available GPU implementations of MrBayes MC3, the cumulative optimizations adopted in goMC3 resulted in a speedup of up 2.5× over oMC3 (v1.0), 1.75× over tgMC3 (v1.0) and 1.46× over nMC3(v2.1.1) for realistic empirical biological datasets. Besides, experimental results indicated that goMC3 outstrips these GPU implementations on the analysis of simulated datasets composed of ultra-large-scale sequences. As a consequence, the reported performance improvement of goMC3 is significant and appears to scale well with increasing dataset sizes.

[1]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[2]  Daniel L. Ayres,et al.  BEAGLE: An Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics , 2011, Systematic biology.

[3]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[4]  Nan Wu,et al.  Resource-efficient utilization of CPU/GPU-based heterogeneous supercomputers for Bayesian phylogenetic inference , 2013, The Journal of Supercomputing.

[5]  Sandhya Dwarkadas,et al.  Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference , 2002, Bioinform..

[6]  Gang Wang,et al.  MrBayes on a Graphics Processing Unit , 2011, Bioinform..

[7]  Qiang Xie,et al.  Potential Key Bases of Ribosomal RNA to Kingdom-Specific Spectra of Antibiotic Susceptibility and the Possible Archaeal Origin of Eukaryotes , 2012, PloS one.

[8]  Cheng Ling,et al.  MrBayes tgMC3: A Tight GPU Implementation of MrBayes , 2013, PloS one.

[9]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[10]  S. Tavaré,et al.  Using the fossil record to estimate the age of the last common ancestor of extant primates , 2002, Nature.

[11]  X. Feng,et al.  PBPI: a High Performance Implementation of Bayesian Phylogenetic Inference , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[12]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[13]  S. Jeffery Evolution of Protein Molecules , 1979 .

[14]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[15]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[16]  Xiaoguang Liu,et al.  Efficient Implementation of MrBayes on Multi-GPU , 2013, Molecular biology and evolution.

[17]  Pietro Liò,et al.  Bayesian Phylogeny on Grid , 2008, BIRD.

[18]  Qiang Xie,et al.  18S rRNA hyper-elongation and the phylogeny of Euhemiptera (Insecta: Hemiptera). , 2008, Molecular phylogenetics and evolution.

[19]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[20]  Xizhou Feng,et al.  Parallel algorithms for Bayesian phylogenetic inference , 2003, J. Parallel Distributed Comput..

[21]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[22]  Pedro Trancoso,et al.  Fine-grain Parallelism Using Multi-core, Cell/BE, and GPU Systems: Accelerating the Phylogenetic Likelihood Function , 2009, 2009 International Conference on Parallel Processing.