Incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography

Abstract Discrete phylogeography using software such as BEAST considers the sampling location of each taxon as fixed; often to a single location without uncertainty. When studying viruses, this implies that there is no possibility that the location of the infected host for that taxa is somewhere else. Here, we relaxed this strong assumption and allowed for analytic integration of uncertainty for discrete virus phylogeography. We used automatic language processing methods to find and assign uncertainty to alternative potential locations. We considered two influenza case studies: H5N1 in Egypt; H1N1 pdm09 in North America. For each, we implemented scenarios in which 25 per cent of the taxa had different amounts of sampling uncertainty including 10, 30, and 50 per cent uncertainty and varied how it was distributed for each taxon. This includes scenarios that: (i) placed a specific amount of uncertainty on one location while uniformly distributing the remaining amount across all other candidate locations (correspondingly labeled 10, 30, and 50); (ii) assigned the remaining uncertainty to just one other location; thus ‘splitting’ the uncertainty among two locations (i.e. 10/90, 30/70, and 50/50); and (iii) eliminated uncertainty via two predefined heuristic approaches: assignment to a centroid location (CNTR) or the largest population in the country (POP). We compared all scenarios to a reference standard (RS) in which all taxa had known (absolutely certain) locations. From this, we implemented five random selections of 25 per cent of the taxa and used these for specifying uncertainty. We performed posterior analyses for each scenario, including: (a) virus persistence, (b) migration rates, (c) trunk rewards, and (d) the posterior probability of the root state. The scenarios with sampling uncertainty were closer to the RS than CNTR and POP. For H5N1, the absolute error of virus persistence had a median range of 0.005–0.047 for scenarios with sampling uncertainty—(i) and (ii) above—versus a range of 0.063–0.075 for CNTR and POP. Persistence for the pdm09 case study followed a similar trend as did our analyses of migration rates across scenarios (i) and (ii). When considering the posterior probability of the root state, we found all but one of the H5N1 scenarios with sampling uncertainty had agreement with the RS on the origin of the outbreak whereas both CNTR and POP disagreed. Our results suggest that assigning geospatial uncertainty to taxa benefits estimation of virus phylogeography as compared to ad-hoc heuristics. We also found that, in general, there was limited difference in results regardless of how the sampling uncertainty was assigned; uniform distribution or split between two locations did not greatly impact posterior results. This framework is available in BEAST v.1.10. In future work, we will explore viruses beyond influenza. We will also develop a web interface for researchers to use our language processing methods to find and assign uncertainty to alternative potential locations for virus phylogeography.

[1]  [Book] Electrons And Phonons The Theory Of Transport Phenomena In Solids Oxford Classic Texts In The Physical Sciences , 2021 .

[2]  Marc A Suchard,et al.  Phylogeography and population dynamics of dengue viruses in the Americas. , 2012, Molecular biology and evolution.

[3]  M. Suchard,et al.  SpreaD3: Interactive Visualization of Spatiotemporal History and Trait Evolutionary Processes. , 2016, Molecular biology and evolution.

[4]  Nuno R. Faria,et al.  Increasing airline travel may facilitate co-circulation of multiple dengue virus serotypes in Asia , 2017, PLoS neglected tropical diseases.

[5]  M. Suchard,et al.  Bayesian Phylogenetics with BEAUti and the BEAST 1.7 , 2012, Molecular biology and evolution.

[6]  Gytis Dudas E-104 Virus genomes reveal factors that spread and sustained the Ebola epidemic , 2018 .

[7]  T. Harder,et al.  Evolutionary features of influenza A/H5N1 virus populations in Egypt: poultry and human health implications , 2016, Archives of Virology.

[8]  Trevor Bedford,et al.  Virus genomes reveal factors that spread and sustained the Ebola epidemic , 2017, Nature.

[9]  Timothy B. Stockwell,et al.  Extensive Geographical Mixing of 2009 Human H1N1 Influenza A Virus in a Single University Community , 2011, Journal of Virology.

[10]  G J Ebrahim,et al.  Neglected tropical diseases , 2005, BMJ : British Medical Journal.

[11]  Jiang Fan,et al.  Phylogeography of the Spring and Fall Waves of the H1N1/09 Pandemic Influenza Virus in the United States , 2010, Journal of Virology.

[12]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[13]  Andrew Rambaut,et al.  Origins of the 2009 H1N1 influenza pandemic in swine in Mexico , 2016, eLife.

[14]  P. Wakeley,et al.  Evolutionary History of Rabies in Ghana , 2011, PLoS neglected tropical diseases.

[15]  Astrid Gall,et al.  Evolutionary Dynamics of Local Pandemic H1N1/2009 Influenza Virus Lineages Revealed by Whole-Genome Analysis , 2011, Journal of Virology.

[16]  Daniel L. Ayres,et al.  Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10 , 2018, Virus evolution.

[17]  E. Holmes,et al.  Evidence for differing evolutionary dynamics of A/H5N1 viruses among countries applying or not applying avian influenza vaccination in poultry. , 2011, Vaccine.

[18]  T. Lam,et al.  Phylodynamics of H5N1 avian influenza virus in Indonesia , 2012, Molecular ecology.

[19]  A. Moreno,et al.  Phylogeography, phylodynamics and transmission chains of bovine viral diarrhea virus subtype 1f in Northern Italy. , 2016, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[20]  L. Real,et al.  Processes Underlying Rabies Virus Incursions across US–Canada Border as Revealed by Whole-Genome Phylogeography , 2017, Emerging infectious diseases.

[21]  Robert Rivera,et al.  Natural Language Processing Methods for Enhancing Geographic Metadata for Phylogeography of Zoonotic Viruses , 2014, BioNLP@ACL.

[22]  K. Wei,et al.  Global genetic variation and transmission dynamics of H9N2 avian influenza virus. , 2018, Transboundary and emerging diseases.

[23]  R. Lathrop,et al.  A statistical phylogeography of influenza A H5N1 , 2007, Proceedings of the National Academy of Sciences.

[24]  Andrés Perez,et al.  Phylodynamics of H5N1 Highly Pathogenic Avian Influenza in Europe, 2005–2010: Potential for Molecular Surveillance of New Outbreaks , 2015, bioRxiv.

[25]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[26]  M. Xiong,et al.  Bayesian Detection of Causal Rare Variants under Posterior Consistency , 2013, PloS one.

[27]  Andrew Rambaut,et al.  Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen) , 2016, Virus evolution.

[28]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[29]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[30]  P. Simmonds,et al.  Origin and fate of A/H1N1 influenza in Scotland during 2009 , 2012, The Journal of general virology.

[31]  G. N. Mahardika,et al.  Phylogeography of the current rabies viruses in Indonesia , 2015, Journal of veterinary science.

[32]  A. Moody,et al.  Spatiotemporal Structure of Molecular Evolution of H5N1 Highly Pathogenic Avian Influenza Viruses in Vietnam , 2010, PLoS ONE.

[33]  Shane S. Sturrock,et al.  Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data , 2012, Bioinform..

[34]  Andrew Rambaut,et al.  Reconstructing the initial global spread of a human influenza pandemic: A Bayesian spatial-temporal model for the global spread of H1N1pdm. , 2009, PLoS currents.

[35]  Dhananjai M Rao,et al.  Enhancing epidemiological analysis of intercontinental dispersion of H5N1 viral strains by migratory waterfowl using phylogeography , 2014, BMC Proceedings.

[36]  Matthew Scotch,et al.  Phylogeography of influenza A H5N1 clade 2.2.1.1 in Egypt , 2013, BMC Genomics.

[37]  G. Dauphin,et al.  Phylodynamics of avian influenza clade 2.2.1 H5N1 viruses in Egypt , 2016, Virology Journal.

[38]  Davy Weissenbacher,et al.  GeoBoost: accelerating research involving the geospatial metadata of virus GenBank records , 2018, Bioinform..

[39]  Cynthia Brandt,et al.  At the Intersection of Public-health Informatics and Bioinformatics: Using Advanced Web Technologies for Phylogeography , 2010, Epidemiology.

[40]  Alexei J. Drummond,et al.  Bayesian Phylogeography Finds Its Roots , 2009, PLoS Comput. Biol..

[41]  E. Holmes,et al.  Evolutionary Dynamics of Multiple Sublineages of H5N1 Influenza Viruses in Nigeria from 2006 to 2008 , 2010, Journal of Virology.

[42]  M. Suchard,et al.  Phylogeography takes a relaxed random walk in continuous space and time. , 2010, Molecular biology and evolution.

[43]  A. Barrett,et al.  Phylogeographic Reconstruction of African Yellow Fever Virus Isolates Indicates Recent Simultaneous Dispersal into East and West Africa , 2013, PLoS neglected tropical diseases.

[44]  W. Bulimo,et al.  Whole genome characterization of human influenza A(H1N1)pdm09 viruses isolated from Kenya during the 2009 pandemic. , 2016, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[45]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[46]  Andrew Rambaut,et al.  Reconstructing the initial global spread of a human influenza pandemic , 2017 .

[47]  Timothy B. Stockwell,et al.  Phylogeography of Influenza A(H3N2) Virus in Peru, 2010–2012 , 2015, Emerging infectious diseases.

[48]  J. Drexler,et al.  Phylogeography of Crimean Congo Hemorrhagic Fever Virus , 2016, PloS one.

[49]  Forrest W. Crawford,et al.  Unifying the spatial epidemiology and molecular evolution of emerging epidemics , 2012, Proceedings of the National Academy of Sciences.

[50]  Mandev S. Gill,et al.  Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. , 2013, Molecular biology and evolution.

[51]  Andrew Rambaut,et al.  The early molecular epidemiology of the swine-origin A/H1N1 human influenza pandemic , 2009, PLoS currents.

[52]  Gavin J. D. Smith,et al.  Phylodynamics of H1N1/2009 influenza reveals the transition from host adaptation to immune-driven selection , 2015, Nature Communications.

[53]  Angeliki Melidou,et al.  Influenza A(H5N1) , 2009, Bundesgesundheitsblatt.

[54]  Trevor Bedford,et al.  Global circulation patterns of seasonal influenza viruses vary with antigenic drift , 2015, Nature.

[55]  Kei-Hoi Cheung,et al.  Enhancing phylogeography by improving geographical information from GenBank , 2011, J. Biomed. Informatics.

[56]  Alexei J Drummond,et al.  Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. , 2006, Molecular biology and evolution.

[57]  Dennis A. Benson,et al.  GenBank , 2012, Nucleic acids research.

[58]  Guy Baele,et al.  Bayesian Inference Reveals Host-Specific Contributions to the Epidemic Expansion of Influenza A H5N1. , 2015, Molecular biology and evolution.

[59]  H. Jeffreys,et al.  Theory of probability , 1896 .

[60]  W. Fitch,et al.  Influenza A H5N1 Immigration Is Filtered Out at Some International Borders , 2008, PloS one.