Accommodating individual travel history, global mobility, and unsampled diversity in phylogeography: a SARS-CoV-2 case study.

Spatiotemporal bias in genome sequence sampling can severely confound phylogeographic inference based on discrete trait ancestral reconstruction. This has impeded our ability to accurately track the emergence and spread of SARS-CoV-2, the virus responsible for the COVID-19 pandemic. Despite the availability of unprecedented numbers of SARS-CoV-2 genomes on a global scale, evolutionary reconstructions are hindered by the slow accumulation of sequence divergence over its relatively short transmission history. When confronted with these issues, incorporating additional contextual data may critically inform phylodynamic reconstructions. Here, we present a new approach to integrate individual travel history data in Bayesian phylogeographic inference and apply it to the early spread of SARS-CoV-2, while also including global air transportation data. We demonstrate that including travel history data for each SARS-CoV-2 genome yields more realistic reconstructions of virus spread, particularly when travelers from undersampled locations are included to mitigate sampling bias. We further explore methods to ameliorate the impact of sampling bias by augmenting the phylogeographic analysis with lineages from undersampled locations in the analyses. Our reconstructions reinforce specific transmission hypotheses suggested by the inclusion of travel history data, but also suggest alternative routes of virus migration that are plausible within the epidemiological context but are not apparent with current sampling efforts. Although further research is needed to fully examine the performance of our travel-aware phylogeographic analyses with unsampled diversity and to further improve them, they represent multiple new avenues for directly addressing the colossal issue of sample bias in phylogeographic inference.

[1]  Trevor Bedford,et al.  Cryptic transmission of SARS-CoV-2 in Washington state , 2020, Science.

[2]  Edward C. Holmes,et al.  A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology , 2020, Nature Microbiology.

[3]  A. von Haeseler,et al.  Corrigendum to: IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era , 2020, Molecular biology and evolution.

[4]  E. Sahafizadeh,et al.  Epidemic curve and reproduction number of COVID-19 in Iran , 2020, Journal of travel medicine.

[5]  P. Lemey,et al.  Temporal signal and the phylodynamic threshold of SARS-CoV-2 , 2020, bioRxiv.

[6]  Edward C. Holmes,et al.  A dynamic nomenclature proposal for SARS-CoV-2 to assist genomic epidemiology , 2020, bioRxiv.

[7]  S. Eubank,et al.  Commentary on Ferguson, et al., “Impact of Non-pharmaceutical Interventions (NPIs) to Reduce COVID-19 Mortality and Healthcare Demand” , 2020, Bulletin of Mathematical Biology.

[8]  Trevor Bedford,et al.  Cryptic transmission of SARS-CoV-2 in Washington state , 2020, Science.

[9]  Wenjun Ma,et al.  Genomic Epidemiology of SARS-CoV-2 in Guangdong Province, China , 2020, Cell.

[10]  M. Lipsitch,et al.  Using observational data to quantify bias of traveller-derived COVID-19 prevalence estimates in Wuhan, China , 2020, The Lancet Infectious Diseases.

[11]  Nuno R. Faria,et al.  A Genomic Survey of SARS-CoV-2 Reveals Multiple Introductions into Northern California without a Predominant Lineage , 2020, medRxiv.

[12]  Isaac I. Bogoch,et al.  Coast-to-coast spread of SARS-CoV-2 in the United States revealed by genomic epidemiology , 2020, medRxiv.

[13]  Hannah R. Meredith,et al.  The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application , 2020, Annals of Internal Medicine.

[14]  K. Kupferschmidt Mutations can reveal how the coronavirus moves—but they’re easy to overinterpret , 2020 .

[15]  Vittoria Colizza,et al.  Lessons learnt from 288 COVID-19 international cases: importations over time, effect of interventions, underdetection of imported cases , 2020, medRxiv.

[16]  E. Hodcroft Preliminary case report on the SARS-CoV-2 cluster in the UK, France, and Spain , 2020, Swiss medical weekly.

[17]  M. Suchard,et al.  In Search of Covariates of HIV-1 Subtype B Spread in the United States—A Cautionary Tale of Large-Scale Bayesian Phylogeography , 2020, Viruses.

[18]  E. Holmes,et al.  An emergent clade of SARS-CoV-2 linked to returned travellers from Iran , 2020, bioRxiv.

[19]  T. F. Rinke de Wit,et al.  Distinct rates and patterns of spread of the major HIV-1 subtypes in Central and East Africa , 2019, PLoS pathogens.

[20]  Olga Chernomor,et al.  IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era , 2019, bioRxiv.

[21]  Sebastián Duchêne,et al.  Bayesian Evaluation of Temporal Signal in Measurably Evolving Populations , 2019, bioRxiv.

[22]  Guy Baele,et al.  Travel Surveillance and Genomics Uncover a Hidden Zika Outbreak during the Waning Epidemic , 2019, Cell.

[23]  Daniel L. Ayres,et al.  BEAGLE 3: Improved Performance, Scaling, and Usability for a High-Performance Computing Library for Statistical Phylogenetics , 2019, Systematic biology.

[24]  Davy Weissenbacher,et al.  Incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography , 2019, Virus evolution.

[25]  P. Lemey,et al.  Tracking virus outbreaks in the twenty-first century , 2018, Nature Microbiology.

[26]  Olivier Gascuel,et al.  A Fast Likelihood Method to Reconstruct and Visualize Ancestral Scenarios , 2018, bioRxiv.

[27]  M. Suchard,et al.  Posterior summarisation in Bayesian phylogenetics using Tracer , 2022 .

[28]  Daniel L. Ayres,et al.  Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10 , 2018, Virus evolution.

[29]  Trevor Bedford,et al.  Nextstrain: real-time tracking of pathogen evolution , 2017, bioRxiv.

[30]  Trevor Bedford,et al.  Virus genomes reveal factors that spread and sustained the Ebola epidemic , 2017, Nature.

[31]  Yuelong Shu,et al.  GISAID: Global initiative on sharing all influenza data – from vision to reality , 2017, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[32]  Tanja Stadler,et al.  The Structured Coalescent and Its Approximations , 2016, bioRxiv.

[33]  Guy Baele,et al.  Emerging Concepts of Data Integration in Pathogen Phylodynamics , 2016, Systematic biology.

[34]  Tanja Stadler,et al.  Phylodynamics with Migration: A Computational Framework to Quantify Population Structure from Genomic Data , 2016, Molecular biology and evolution.

[35]  M. Suchard,et al.  Genealogical Working Distributions for Bayesian Model Testing with Phylogenetic Uncertainty. , 2016, Systematic biology.

[36]  David A. Matthews,et al.  Real-time, portable genome sequencing for Ebola surveillance , 2016, Nature.

[37]  Marc A Suchard,et al.  Understanding Past Population Dynamics: Bayesian Coalescent-Based Modeling with Covariates. , 2016, Systematic biology.

[38]  Andrew Rambaut,et al.  Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen) , 2016, Virus evolution.

[39]  D. W. Goodall,et al.  A maximum likelihood approach , 2016 .

[40]  Nicola De Maio,et al.  New Routes to Phylogeography: A Bayesian Structured Coalescent Approximation , 2015, PLoS genetics.

[41]  G. Abel,et al.  Quantifying Global International Migration Flows , 2014, Science.

[42]  M. Suchard,et al.  Unifying Viral Genetics and Human Transportation Data to Predict the Global Transmission Dynamics of Human Influenza H3N2 , 2014, PLoS pathogens.

[43]  Guy Baele,et al.  Inferring Heterogeneous Evolutionary Processes Through Time: from Sequence Substitution to Phylogeography , 2013, Systematic biology.

[44]  Ta-Hsin Li,et al.  Maximum Likelihood Approach , 2013 .

[45]  Alexei J. Drummond,et al.  Bayesian Phylogeography Finds Its Roots , 2009, PLoS Comput. Biol..

[46]  Kazutaka Katoh,et al.  Multiple alignment of DNA sequences with MAFFT. , 2009, Methods in molecular biology.

[47]  Marc A Suchard,et al.  Fast, accurate and simulation-free stochastic mapping , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[48]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[49]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[50]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[51]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.