Temporal signal and the phylodynamic threshold of SARS-CoV-2

The ongoing SARS-CoV-2 outbreak marks the first time that large amounts of genome sequence data have been generated and made publicly available in near real-time. Early analyses of these data revealed low sequence variation, a finding that is consistent with a recently emerging outbreak, but which raises the question of whether such data are sufficiently informative for phylogenetic inferences of evolutionary rates and time scales. The phylodynamic threshold is a key concept that refers to the point in time at which sufficient molecular evolutionary change has accumulated in available genome samples to obtain robust phylodynamic estimates. For example, before the phylodynamic threshold is reached, genomic variation is so low that even large amounts of genome sequences may be insufficient to estimate the virus’s evolutionary rate and the time scale of an outbreak. We collected genome sequences of SARS-CoV-2 from public databases at 8 different points in time and conducted a range of tests of temporal signal to determine if and when the phylodynamic threshold was reached, and the range of inferences that could be reliably drawn from these data. Our results indicate that by February 2nd 2020, estimates of evolutionary rates and time scales had become possible. Analyses of subsequent data sets, that included between 47 to 122 genomes, converged at an evolutionary rate of about 1.1×10−3 subs/site/year and a time of origin of around late November 2019. Our study provides guidelines to assess the phylodynamic threshold and demonstrates that establishing this threshold constitutes a fundamental step for understanding the power and limitations of early data in outbreak genome surveillance.

[1]  Melis N. Anahtar,et al.  Phylogenetic analysis of SARS-CoV-2 in Boston highlights the impact of superspreading events , 2020, Science.

[2]  Guy Baele,et al.  A Phylodynamic Workflow to Rapidly Gain Insights into the Dispersal History and Dynamics of SARS-CoV-2 Lineages , 2020, bioRxiv.

[3]  E. Holmes,et al.  The proximal origin of SARS-CoV-2 , 2020, Nature Medicine.

[4]  Nichollas E. Scott,et al.  Direct RNA sequencing and early evolution of SARS-CoV-2 , 2020, bioRxiv.

[5]  N. Goldman,et al.  Sampling bias and model choice in continuous phylogeography: Getting lost on a random walk , 2020, bioRxiv.

[6]  C. Donnelly,et al.  Report 5: Phylogenetic analysis of SARS-CoV-2 , 2020 .

[7]  Tanja Stadler,et al.  Improved multi-type birth-death phylodynamic inference in BEAST 2 , 2020, bioRxiv.

[8]  Olga Chernomor,et al.  IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era , 2019, bioRxiv.

[9]  Sebastián Duchêne,et al.  Bayesian Evaluation of Temporal Signal in Measurably Evolving Populations , 2019, bioRxiv.

[10]  E. Holmes,et al.  Infectious disease phylodynamics with occurrence data , 2019, bioRxiv.

[11]  M. Suchard,et al.  Software for Systematics and Evolution , 2018 .

[12]  Daniel L. Ayres,et al.  Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10 , 2018, Virus evolution.

[13]  Veronika Boskova,et al.  The influence of phylodynamic model specifications on parameter estimates of the Zika virus epidemic , 2018, Virus evolution.

[14]  Trevor Bedford,et al.  Nextstrain: real-time tracking of pathogen evolution , 2017, bioRxiv.

[15]  Guy Baele,et al.  Emerging Concepts of Data Integration in Pathogen Phylodynamics , 2016, Systematic biology.

[16]  Meade Bernard,et al.  Spartan HPC-Cloud Hybrid: Delivering Performance and Flexibility , 2017 .

[17]  Sebastián Duchêne,et al.  Estimating evolutionary rates using time-structured data: a general comparison of phylogenetic methods , 2016, Bioinform..

[18]  M. Suchard,et al.  Genealogical Working Distributions for Bayesian Model Testing with Phylogenetic Uncertainty. , 2016, Systematic biology.

[19]  Andrew Rambaut,et al.  Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen) , 2016, Virus evolution.

[20]  John J. Welch,et al.  The effect of genetic structure on molecular dating and tests for temporal signal , 2015, Methods in ecology and evolution.

[21]  L. du Plessis,et al.  Getting to the root of epidemic spread with phylodynamic analysis of genomic data. , 2015, Trends in microbiology.

[22]  O. Pybus,et al.  Measurably evolving pathogens in the genomic era. , 2015, Trends in ecology & evolution.

[23]  Sebastián Duchêne,et al.  The Performance of the Date-Randomization Test in Phylogenetic Analyses of Time-Structured Virus Data. , 2015, Molecular biology and evolution.

[24]  Veronika Boskova,et al.  Inference of Epidemiological Dynamics Based on Simulated Phylogenies Using Birth-Death and Coalescent Models , 2014, PLoS Comput. Biol..

[25]  A. Rambaut,et al.  Real-time characterization of the molecular epidemiology of an influenza pandemic , 2013, Biology Letters.

[26]  Wai Lok Sibon Li,et al.  Accurate model selection of relaxed molecular clocks in bayesian phylogenetics. , 2012, Molecular biology and evolution.

[27]  Ziheng Yang,et al.  Exploring uncertainty in the calibration of the molecular clock , 2012, Biology Letters.

[28]  Ming-Hui Chen,et al.  Choosing among Partition Models in Bayesian Phylogenetics , 2010, Molecular biology and evolution.

[29]  M. Lipsitch,et al.  How generation intervals shape the relationship between growth rates and reproductive numbers , 2007, Proceedings of the Royal Society B: Biological Sciences.

[30]  Tony O’Hagan Bayes factors , 2006 .

[31]  A. Rodrigo,et al.  Measurably evolving populations , 2003 .

[32]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[33]  A. Lapedes,et al.  Timing the ancestor of the HIV-1 pandemic strains. , 2000, Science.

[34]  T Gojobori,et al.  Molecular clock of viral evolution, and the neutral theory. , 1990, Proceedings of the National Academy of Sciences of the United States of America.