Incorporating heterogeneous sampling probabilities in continuous phylogeographic inference - Application to H5N1 spread in the Mekong region

MOTIVATION The potentially low precision associated with the geographic origin of sampled sequences represents an important limitation for spatially-explicit (i.e. continuous) phylogeographic inference of fast-evolving pathogens such as RNA viruses. A substantial proportion of publicly available sequences are geo-referenced at broad spatial scale such as, for example, the administrative unit of origin rather than more exact locations (e.g. GPS coordinates). Most frequently, such sequences are either discarded prior to continuous phylogeographic inference or arbitrarily assigned to the geographic coordinates of the centroid of their administrative area of origin for lack of a better possibility. RESULTS We here implement and describe a new approach that allows to incorporate heterogeneous prior sampling probabilities over a geographic area. External data, such as outbreak locations, are used to specify these prior sampling probabilities over a collection of sub-polygons. We apply this new method to the analysis of highly pathogenic avian influenza (HPAI) H5N1 clade data in the Mekong region. Our method allows to properly include, in continuous phylogeographic analyses, H5N1 sequences that are only associated with large administrative areas of origin and assign them with more accurate locations. Finally, we use continuous phylogeographic reconstructions to analyse the dispersal dynamics of different H5N1 clades and investigate the impact of environmental factors on lineage dispersal velocities. AVAILABILITY Our new method allowing heterogeneous sampling priors for continuous phylogeographic inference is implemented in the open-source multi-platform software package BEAST 1.10. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online and on figshare.com.

[1]  Davy Weissenbacher,et al.  Named entity linking of geospatial and host metadata in GenBank for advancing biomedical research , 2017, Database J. Biol. Databases Curation.

[2]  Wei Zhou,et al.  Phylogeography of Avian influenza A H9N2 in China , 2014, BMC Genomics.

[3]  M. Suchard,et al.  On the biogeography of Centipeda: a species-tree diffusion approach. , 2014, Systematic biology.

[4]  Daniel L. Ayres,et al.  BEAGLE 3: Improved Performance, Scaling, and Usability for a High-Performance Computing Library for Statistical Phylogenetics , 2019, Systematic biology.

[5]  Chun-Nan Hsu,et al.  Weakly supervised learning of biomedical information extraction from curated data , 2016, BMC Bioinformatics.

[6]  P. Daszak,et al.  Predicting the global spread of H5N1 avian influenza , 2006, Proceedings of the National Academy of Sciences.

[7]  Forrest W. Crawford,et al.  Unifying the spatial epidemiology and molecular evolution of emerging epidemics , 2012, Proceedings of the National Academy of Sciences.

[8]  Mandev S. Gill,et al.  Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. , 2013, Molecular biology and evolution.

[9]  Peter Mertens,et al.  Bluetongue virus spread in Europe is a consequence of climatic, landscape and vertebrate host factors as revealed by phylogeographic inference , 2017, Proceedings of the Royal Society B: Biological Sciences.

[10]  Y. Guan,et al.  Genesis of a highly pathogenic and potentially pandemic H5N1 influenza virus in eastern Asia , 2004, Nature.

[11]  Lu Lu,et al.  Determining the Phylogenetic and Phylogeographic Origin of Highly Pathogenic Avian Influenza (H7N3) in Mexico , 2014, PloS one.

[12]  Marius Gilbert,et al.  Using Viral Gene Sequences to Compare and Explain the Heterogeneous Spatial Dynamics of Virus Epidemics , 2017, Molecular biology and evolution.

[13]  Gwenaelle Dauphin,et al.  The EMPRES-i genetic module: a novel tool linking epidemiological outbreak information and genetic characteristics of influenza viruses , 2014, Database J. Biol. Databases Curation.

[14]  Julian Parkhill,et al.  A genomic portrait of the emergence, evolution, and global spread of a methicillin-resistant Staphylococcus aureus pandemic , 2013, Genome research.

[15]  Guy Baele,et al.  Bayesian Inference Reveals Host-Specific Contributions to the Epidemic Expansion of Influenza A H5N1. , 2015, Molecular biology and evolution.

[16]  Daniel L. Ayres,et al.  Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10 , 2018, Virus evolution.

[17]  M. F. Boni,et al.  Highly Pathogenic Avian Influenza Virus A/H5N1 Infection in Vaccinated Meat Duck Flocks in the Mekong Delta of Vietnam , 2016, Transboundary and emerging diseases.

[18]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[19]  Yi Guan,et al.  Multiple Sublineages of Influenza A Virus (H5N1), Vietnam, 2005−2007 , 2008, Emerging infectious diseases.

[20]  Rebecca Rose,et al.  Explaining the geographic spread of emerging epidemics: a framework for comparing viral phylogenies and environmental landscape data , 2016, BMC Bioinformatics.

[21]  M. Gilbert,et al.  Mapping H5N1 highly pathogenic avian influenza risk in Southeast Asia , 2008, Proceedings of the National Academy of Sciences.

[22]  Lies Laenen,et al.  Spatio‐temporal analysis of Nova virus, a divergent hantavirus circulating in the European mole in Belgium , 2016, Molecular ecology.

[23]  M. Suchard,et al.  Phylogeography takes a relaxed random walk in continuous space and time. , 2010, Molecular biology and evolution.

[24]  D. Pfeiffer,et al.  An analysis of the spatial and temporal patterns of highly pathogenic avian influenza occurrence in Vietnam using national surveillance data. , 2007, Veterinary journal.

[25]  Catherine Linard,et al.  Clade-level Spatial Modelling of HPAI H5N1 Dynamics in the Mekong Region Reveals New Patterns and Associations with Agro-Ecological Factors , 2016, Scientific Reports.

[26]  V. Martin,et al.  Origin and evolution of highly pathogenic H5N1 avian influenza in Asia , 2005, Veterinary Record.

[27]  J. Rushton,et al.  Experiences with vaccination in countries endemically infected with highly pathogenic avian influenza: the Food and Agriculture Organization perspective. , 2009, Revue scientifique et technique.

[28]  Alexei J Drummond,et al.  Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. , 2006, Molecular biology and evolution.

[29]  Leslie A Real,et al.  A high-resolution genetic signature of demographic and spatial expansion in epizootic rabies virus , 2007, Proceedings of the National Academy of Sciences.

[30]  David Zilberman,et al.  A one health perspective on HPAI H5N1 in the Greater Mekong sub-region. , 2013, Comparative immunology, microbiology and infectious diseases.

[31]  Rebecca Rose,et al.  SERAPHIM: studying environmental rasters and phylogenetically informed movements , 2016, Bioinform..