Boosting the Performance of Bayesian Divergence Time Estimation with the Phylogenetic Likelihood Library

We present a substantially improved and parallelized version of DPPDiv, a software tool for estimating species divergence times and lineage-specific substitution rates on a fixed tree topology. The improvement is achieved by integrating the DPPDiv code with the Phylogenetic Likelihood Library (PLL), a fast, optimized, and parallelized collection of functions for conducting likelihood computations on phylogenetic trees. We show that, integrating the PLL into a likelihoodbased application is straight-forward since it took the first author (DD) a programming effort of only one month, without having prior knowledge of DPPDiv, nor the PLL. We achieve sequential speedups that range between a factor of two to three and near-optimal parallel speedups up to 48 threads on sufficiently large datasets. Hence, with a programming effort of one month, we were able to improve DPPDiv's time-to-solution on parallel systems by two orders of magnitude and also to substantially improve its ability to infer divergence times on large-scale datasets.

[1]  M. Suchard,et al.  Bayesian Phylogenetics with BEAUti and the BEAST 1.7 , 2012, Molecular biology and evolution.

[2]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[3]  A. Salamov,et al.  The Plant Cell Wall–Decomposing Machinery Underlies the Functional Diversity of Forest Fungi , 2011, Science.

[4]  T. J. Robinson,et al.  Impacts of the Cretaceous Terrestrial Revolution and KPg Extinction on Mammal Diversification , 2011, Science.

[5]  Alexandros Stamatakis,et al.  Load Balance in the Phylogenetic Likelihood Kernel , 2009, 2009 International Conference on Parallel Processing.

[6]  M. Suchard,et al.  Bayesian random local clocks, or one rate to rule them all , 2010, BMC Biology.

[7]  J. Huelsenbeck,et al.  A compound poisson process for relaxing the molecular clock. , 2000, Genetics.

[8]  S. Ho,et al.  Accounting for calibration uncertainty in phylogenetic estimation of evolutionary divergence times. , 2009, Systematic biology.

[9]  Nicolas Lartillot,et al.  PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating , 2009, Bioinform..

[10]  Jiajie Zhang,et al.  The Multi-Processor Scheduling Problem in Phylogenetics , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[11]  Alexandros Stamatakis Orchestrating the Phylogenetic Likelihood Function on Emerging Parallel Architectures , 2010 .

[12]  Ziheng Yang,et al.  Comparison of likelihood and Bayesian methods for estimating divergence times using multiple gene Loci and calibration points, with application to a radiation of cute-looking mouse lemur species. , 2003, Systematic biology.

[13]  Alexandros Stamatakis,et al.  Trading Running Time for Memory in Phylogenetic Likelihood Computations , 2012, BIOINFORMATICS.

[14]  Brian C. O'Meara,et al.  treePL: divergence time estimation using penalized likelihood for large phylogenies , 2012, Bioinform..

[15]  Masami Hasegawa,et al.  Phylogenomic datasets provide both precision and accuracy in estimating the timescale of placental mammal phylogeny , 2012, Proceedings of the Royal Society B: Biological Sciences.

[16]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[17]  H. Kishino,et al.  Estimating the rate of evolution of the rate of molecular evolution. , 1998, Molecular biology and evolution.

[18]  Koichiro Tamura,et al.  Estimating divergence times in large molecular phylogenies , 2012, Proceedings of the National Academy of Sciences.

[19]  Alexandros Stamatakis,et al.  RAxML-Light: a tool for computing terabyte phylogenies , 2012, Bioinform..

[20]  Alexandros Stamatakis,et al.  Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees , 2011, BMC Bioinformatics.

[21]  Yi Guan,et al.  Temporally structured metapopulation dynamics and persistence of influenza A H3N2 virus in humans , 2011, Proceedings of the National Academy of Sciences.

[22]  B. Rannala,et al.  Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. , 1997, Molecular biology and evolution.

[23]  T. Heath,et al.  A hierarchical Bayesian model for calibrating estimates of species divergence times. , 2012, Systematic biology.

[24]  Xuhua Xia,et al.  A distance-based least-square method for dating speciation events. , 2011, Molecular phylogenetics and evolution.

[25]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[26]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[27]  J. Gagneur,et al.  TRADING MEMORY FOR RUNNING TIME IN PHYLOGENETIC LIKELIHOOD COMPUTATIONS , 2011 .

[28]  M. Sanderson Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach. , 2002, Molecular biology and evolution.

[29]  Ziheng Yang,et al.  Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. , 2006, Molecular biology and evolution.

[30]  John P Huelsenbeck,et al.  A dirichlet process prior for estimating lineage-specific substitution rates. , 2012, Molecular biology and evolution.

[31]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[32]  S. Shultz,et al.  Stepwise evolution of stable sociality in primates , 2011, Nature.

[33]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[34]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[35]  Effrey,et al.  Divergence Time and Evolutionary Rate Estimation with Multilocus Data , 2002 .

[36]  D. Bryant,et al.  A general comparison of relaxed molecular clock models. , 2007, Molecular biology and evolution.

[37]  W. Bruno,et al.  Performance of a divergence time estimation method under a probabilistic model of rate evolution. , 2001, Molecular biology and evolution.

[38]  S. Guindon,et al.  Bayesian estimation of divergence times from large sequence alignments. , 2010, Molecular biology and evolution.

[39]  O. Madsen,et al.  Vertebrate time-tree elucidates the biogeographic pattern of a major biotic change around the K–T boundary in Madagascar , 2012, Proceedings of the National Academy of Sciences.

[40]  Alexandros Stamatakis,et al.  Efficient computation of the phylogenetic likelihood function on multi-gene alignments and multi-core architectures , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[41]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[42]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[43]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[44]  Michael J. Sanderson,et al.  R8s: Inferring Absolute Rates of Molecular Evolution, Divergence times in the Absence of a Molecular Clock , 2003, Bioinform..

[45]  Daniel L. Ayres,et al.  BEAGLE: An Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics , 2011, Systematic biology.

[46]  Ziheng Yang,et al.  Approximate likelihood calculation on a phylogeny for Bayesian estimation of divergence times. , 2011, Molecular biology and evolution.