Scalable Bayesian divergence time estimation with ratio transformations.

Divergence time estimation is crucial to provide temporal signals for dating biologically important events, from species divergence to viral transmissions in space and time. With the advent of high-throughput sequencing, recent Bayesian phylogenetic studies have analyzed hundreds to thousands of sequences. Such large-scale analyses challenge divergence time reconstruction by requiring inference on highly-correlated internal node heights that often become computationally infeasible. To overcome this limitation, we explore a ratio transformation that maps the original N ́ 1 internal node heights into a space of one height parameter and N ́ 2 ratio parameters. To make analyses scalable, we develop a collection of linear-time algorithms to compute the gradient and Jacobian-associated terms of the log-likelihood with respect to these ratios. We then apply Hamiltonian Monte Carlo sampling with the ratio transform in a Bayesian framework to learn the divergence times in four pathogenic virus phylogenies: West Nile virus, rabies virus, Lassa virus and Ebola virus. Our method both resolves a mixing issue in the West Nile virus example and improves inference efficiency by at least 5-fold for the Lassa and rabies virus examples. Our method also makes it now computationally feasible to incorporate mixed-effects molecular clock models for the Ebola virus example, confirms the findings from the original study and reveals clearer multimodal distributions of the divergence times of some clades of interest.

[1]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[2]  Xiang Ji,et al.  Gradients do grow on trees: a linear-time 𝒪 (N)-dimensional gradient for statistical phylogenetics. , 2020, Molecular biology and evolution.

[3]  Guy Baele,et al.  Adaptive MCMC in Bayesian phylogenetics: an application to analyzing partitioned data in BEAST , 2017, Bioinform..

[4]  Marc A Suchard,et al.  Understanding Past Population Dynamics: Bayesian Coalescent-Based Modeling with Covariates. , 2016, Systematic biology.

[5]  M. Suchard,et al.  Measles virus and rinderpest virus divergence dated to the sixth century BCE , 2020, Science.

[6]  Philippe Lemey,et al.  Hamiltonian Monte Carlo sampling to estimate past population dynamics using the skygrid coalescent model in a Bayesian phylogenetics framework , 2020, Wellcome open research.

[7]  Forrest W. Crawford,et al.  Unifying the spatial epidemiology and molecular evolution of emerging epidemics , 2012, Proceedings of the National Academy of Sciences.

[8]  T. J. Robinson,et al.  Impacts of the Cretaceous Terrestrial Revolution and KPg Extinction on Mammal Diversification , 2011, Science.

[9]  Fredrik Ronquist,et al.  A mixed relaxed clock model , 2016, Philosophical Transactions of the Royal Society B: Biological Sciences.

[10]  Christophe Andrieu,et al.  A tutorial on adaptive MCMC , 2008, Stat. Comput..

[11]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[12]  Daniel L. Ayres,et al.  BEAGLE 3: Improved Performance, Scaling, and Usability for a High-Performance Computing Library for Statistical Phylogenetics , 2019, Systematic biology.

[13]  S. Ho,et al.  Time‐dependent estimates of molecular evolutionary rates: evidence and causes , 2015, Molecular ecology.

[14]  Leslie A Real,et al.  A high-resolution genetic signature of demographic and spatial expansion in epizootic rabies virus , 2007, Proceedings of the National Academy of Sciences.

[15]  Elizabeth M. Ryan,et al.  Clinical Sequencing Uncovers Origins and Evolution of Lassa Virus , 2015, Cell.

[16]  Gareth O. Roberts,et al.  Examples of Adaptive MCMC , 2009 .

[17]  H. Kishino,et al.  Estimating the rate of evolution of the rate of molecular evolution. , 1998, Molecular biology and evolution.

[18]  Aaron E. Darling,et al.  Evaluating probabilistic programming and fast variational Bayesian inference in phylogenetics , 2019, bioRxiv.

[19]  M. Suchard,et al.  Accommodating individual travel history and unsampled diversity in Bayesian phylogeographic inference of SARS-CoV-2 , 2020, Nature Communications.

[20]  A. Katzourakis,et al.  Time dependency of foamy virus evolutionary rate estimates , 2015, BMC Evolutionary Biology.

[21]  John Salvatier,et al.  Probabilistic programming in Python using PyMC3 , 2016, PeerJ Comput. Sci..

[22]  Cheng Zhang,et al.  Probabilistic Path Hamiltonian Monte Carlo , 2017, ICML.

[23]  T. Bedford,et al.  Twenty years of West Nile virus spread and evolution in the Americas visualized by Nextstrain , 2019, PLoS pathogens.

[24]  Heikki Haario,et al.  Adaptive proposal distribution for random walk Metropolis algorithm , 1999, Comput. Stat..

[25]  Daniel L. Ayres,et al.  Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10 , 2018, Virus evolution.

[26]  D. Erwin,et al.  The Cambrian Conundrum: Early Divergence and Later Ecological Success in the Early History of Animals , 2011, Science.

[27]  M. Suchard,et al.  Phylogeography takes a relaxed random walk in continuous space and time. , 2010, Molecular biology and evolution.

[28]  M. Suchard,et al.  Bayesian Inference of Evolutionary Histories under Time-Dependent Substitution Rates , 2019, Molecular biology and evolution.

[29]  M. Suchard,et al.  Divergence dating using mixed effects clock modelling: An application to HIV-1 , 2019, Virus evolution.

[30]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[31]  Ziheng Yang,et al.  Inferring speciation times under an episodic molecular clock. , 2007, Systematic biology.

[32]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[33]  W. Bruno,et al.  Performance of a divergence time estimation method under a probabilistic model of rate evolution. , 2001, Molecular biology and evolution.

[34]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.