Scalable relaxed clock phylogenetic dating

Molecular clock models relate observed genetic diversity to calendar time, enabling estimation of times of common ancestry. Many large datasets of fast-evolving viruses are not well fitted by molecular clock models that assume a constant substitution rate through time, and more flexible relaxed clock models are required for robust inference of rates and dates. Estimation of relaxed molecular clocks using Bayesian Markov chain Monte Carlo is computationally expensive and may not scale well to large datasets. We build on recent advances in maximum likelihood and least-squares phylogenetic and molecular clock dating methods to develop a fast relaxed-clock method based on a Gamma-Poisson mixture model of substitution rates. This method estimates a distinct substitution rate for every lineage in the phylogeny while being scalable to large phylogenies. Unknown lineage sample dates can be estimated as well as unknown root position. We estimate confidence intervals for rates, dates, and tip dates using parametric and non-parametric bootstrap approaches. This method is implemented as an open-source R package, treedater.

[1]  M. Suchard,et al.  Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. , 2008, Molecular biology and evolution.

[2]  W. Stute,et al.  Bootstrap based goodness-of-fit-tests , 1993 .

[3]  Edward C. Holmes,et al.  The evolution of Ebola virus: Insights from the 2013–2016 epidemic , 2016, Nature.

[4]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[5]  A. Rodrigo,et al.  Measurably evolving populations , 2003 .

[6]  M. Nordborg,et al.  Coalescent Theory , 2019, Handbook of Statistical Genomics.

[7]  J H Gillespie,et al.  The molecular clock may be an episodic clock. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Trevor Bedford,et al.  Virus genomes reveal factors that spread and sustained the Ebola epidemic , 2017, Nature.

[9]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[10]  Mikiko Senga,et al.  Ebola virus disease in West Africa--the first 9 months of the epidemic and forward projections. , 2014, The New England journal of medicine.

[11]  Ethan Romero-Severson,et al.  Phylodynamic Inference across Epidemic Scales , 2017, Molecular biology and evolution.

[12]  Alan Y. Chiang,et al.  Generalized Additive Models: An Introduction With R , 2007, Technometrics.

[13]  M. Greenwood,et al.  An Inquiry into the Nature of Frequency Distributions Representative of Multiple Happenings with Particular Reference to the Occurrence of Multiple Attacks of Disease or of Repeated Accidents , 1920 .

[14]  Andrew Rambaut,et al.  Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen) , 2016, Virus evolution.

[15]  Art F. Y. Poon,et al.  node.dating: dating ancestors in phylogenetic trees in R , 2017, Bioinform..

[16]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[17]  Sebastián Duchêne,et al.  Estimating evolutionary rates using time-structured data: a general comparison of phylogenetic methods , 2016, Bioinform..

[18]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[19]  Charles H. Langley,et al.  An examination of the constancy of the rate of molecular evolution , 2005, Journal of Molecular Evolution.

[20]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[21]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[22]  O. Pybus,et al.  Inference of viral evolutionary rates from molecular sequences. , 2003, Advances in parasitology.

[23]  Olivier Gascuel,et al.  Fast Dating Using Least-Squares Criteria and Algorithms , 2015, Systematic biology.

[24]  Sudhir Kumar,et al.  Advances in Time Estimation Methods for Molecular Data. , 2016, Molecular biology and evolution.

[25]  Sebastián Duchêne,et al.  The Performance of the Date-Randomization Test in Phylogenetic Analyses of Time-Structured Virus Data. , 2015, Molecular biology and evolution.

[26]  Leslie A Real,et al.  A high-resolution genetic signature of demographic and spatial expansion in epizootic rabies virus , 2007, Proceedings of the National Academy of Sciences.

[27]  W. Team Ebola Virus Disease in West Africa — The First 9 Months of the Epidemic and Forward Projections , 2014 .

[28]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[29]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[30]  Michael J. Sanderson,et al.  R8s: Inferring Absolute Rates of Molecular Evolution, Divergence times in the Absence of a Molecular Clock , 2003, Bioinform..

[31]  O. Pybus,et al.  The epidemiology and iatrogenic transmission of hepatitis C virus in Egypt: a Bayesian coalescent approach. , 2003, Molecular biology and evolution.

[32]  Andrew Rambaut,et al.  Reduced evolutionary rate in reemerged Ebola virus transmission chains , 2016, Science Advances.

[33]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..