MrBayes 3.2.6 on Tianhe-1A: A High Performance and Distributed Implementation of Phylogenetic Analysis

Phylogenetic analysis has achieved extraordinary results in domains like species delimitation and evolutionary biology. An essential element behind this success has been the introduction of high performance computing techniques in the step of estimating the phylogenetic likelihoods. This paper describes the design and implementation of a distributed and CPU-GPU based heterogeneous computing system on parallelizing the analysis. The parallelization has been implemented in the state-of-the-art version of MrBayes, a widespread phylogeny reconstruction program. We benchmarked the method and another two GPU-based methods by using 8 distributed computing nodes on Tianhe-1A. The experimental results indicate that the proposed method outstrips BEAGLE and the nMC3 method by speedup factors of up to 1.98× and 1.68×, respectively. In comparison to the serially implemented MrBayes, a peak speedup of 188× is finally achieved by using 8 Tesla M 2050 GPUs. The proposed method is publicly available to facilitate further research on phylogenetic analysis.

[1]  Pietro Liò,et al.  Bayesian Phylogeny on Grid , 2008, BIRD.

[2]  Stephen C. Trowell,et al.  Correction: Topological and Functional Characterization of an Insect Gustatory Receptor , 2011, PLoS ONE.

[3]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[4]  J. Cavender-Bares,et al.  The merging of community ecology and phylogenetic biology. , 2009, Ecology letters.

[5]  Xiaoguang Liu,et al.  Efficient Implementation of MrBayes on Multi-GPU , 2013, Molecular biology and evolution.

[6]  Daniel L. Ayres,et al.  BEAGLE: An Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics , 2011, Systematic biology.

[7]  X. Feng,et al.  PBPI: a High Performance Implementation of Bayesian Phylogenetic Inference , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[8]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[9]  Sandhya Dwarkadas,et al.  Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference , 2002, Bioinform..

[10]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[11]  Pedro Trancoso,et al.  Fine-grain Parallelism Using Multi-core, Cell/BE, and GPU Systems: Accelerating the Phylogenetic Likelihood Function , 2009, 2009 International Conference on Parallel Processing.

[12]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[13]  Gang Wang,et al.  MrBayes on a Graphics Processing Unit , 2011, Bioinform..

[14]  Nan Wu,et al.  Resource-efficient utilization of CPU/GPU-based heterogeneous supercomputers for Bayesian phylogenetic inference , 2013, The Journal of Supercomputing.

[15]  N. Baeshen,et al.  Biological Identifications Through DNA Barcodes , 2012 .

[16]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[17]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[18]  Qiang Xie,et al.  18S rRNA hyper-elongation and the phylogeny of Euhemiptera (Insecta: Hemiptera). , 2008, Molecular phylogenetics and evolution.

[19]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[20]  Derrick J. Zwickl Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion , 2006 .

[21]  Xizhou Feng,et al.  Parallel algorithms for Bayesian phylogenetic inference , 2003, J. Parallel Distributed Comput..

[22]  Cheng Ling,et al.  MrBayes tgMC3: A Tight GPU Implementation of MrBayes , 2013, PloS one.

[23]  John P Huelsenbeck,et al.  A dirichlet process prior for estimating lineage-specific substitution rates. , 2012, Molecular biology and evolution.

[24]  A. Rambaut,et al.  BEAST: Bayesian evolutionary analysis by sampling trees , 2007, BMC Evolutionary Biology.

[25]  Niko Beerenwinkel,et al.  BitPhylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies , 2015, Genome Biology.