Bayesian reconstruction of transmission within outbreaks using genomic variants

Pathogen genome sequencing can reveal details of transmission histories and is a powerful tool in the fight against infectious disease. In particular, within-host pathogen genomic variants identified through heterozygous nucleotide base calls are a potential source of information to identify linked cases and infer direction and time of transmission. However, using such data effectively to model disease transmission presents a number of challenges, including differentiating genuine variants from those observed due to sequencing error, as well as the specification of a realistic model for within-host pathogen population dynamics. Here we propose a new Bayesian approach to transmission inference, BadTrIP (BAyesian epiDemiological TRansmission Inference from Polymorphisms), that explicitly models evolution of pathogen populations in an outbreak, transmission (including transmission bottlenecks), and sequencing error. BadTrIP enables the inference of host-to-host transmission from pathogen sequencing data and epidemiological data. By assuming that genomic variants are unlinked, our method does not require the computationally intensive and unreliable reconstruction of individual haplotypes. Using simulations we show that BadTrIP is robust in most scenarios and can accurately infer transmission events by efficiently combining information from genetic and epidemiological sources; thanks to its realistic model of pathogen evolution and the inclusion of epidemiological data, BadTrIP is also more accurate than existing approaches. BadTrIP is distributed as an open source package (https://bitbucket.org/nicofmay/badtrip) for the phylogenetic software BEAST2. We apply our method to reconstruct transmission history at the early stages of the 2014 Ebola outbreak, showcasing the power of within-host genomic variants to reconstruct transmission events.

[1]  Luay Nakhleh,et al.  Coalescent histories on phylogenetic networks and detection of hybridization despite incomplete lineage sorting. , 2011, Systematic biology.

[2]  Rowena A. Bull,et al.  Contribution of Intra- and Interhost Dynamics to Norovirus Evolution , 2011, Journal of Virology.

[3]  Joanna B. Goldberg,et al.  Parallel bacterial evolution within multiple patients identifies candidate pathogenicity genes , 2011, Nature Genetics.

[4]  Ingo Bulla,et al.  Phylogenetically resolving epidemiologic linkage , 2016, Proceedings of the National Academy of Sciences.

[5]  Daniel J. Wilson,et al.  Insights from Genomics into Bacterial Pathogen Populations , 2012, PLoS pathogens.

[6]  Daniel J. Wilson,et al.  Diverse sources of C. difficile infection identified on whole-genome sequencing. , 2013, The New England journal of medicine.

[7]  G. Crooks,et al.  WebLogo: A sequence logo generator, Genome Research, , 2004 .

[8]  Christian Schlötterer,et al.  Linking Great Apes Genome Evolution across Time Scales Using Polymorphism-Aware Phylogenetic Models , 2013, Molecular biology and evolution.

[9]  Julian Parkhill,et al.  Evolution of MRSA During Hospital Transmission and Intercontinental Spread , 2010, Science.

[10]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[11]  P. A. P. Moran,et al.  Random processes in genetics , 1958, Mathematical Proceedings of the Cambridge Philosophical Society.

[12]  Paul Marjoram,et al.  Fast "coalescent" simulation , 2006, BMC Genetics.

[13]  Igor Mandric,et al.  QUENTIN: reconstruction of disease transmissions from viral quasispecies genomic data , 2018, Bioinform..

[14]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[15]  Marc Lipsitch,et al.  Shared Genomic Variants: Identification of Transmission Routes Using Pathogen Deep-Sequence Data , 2017, American journal of epidemiology.

[16]  J Wallinga,et al.  Unravelling transmission trees of infectious diseases by combining genetic and epidemiological data , 2012, Proceedings of the Royal Society B: Biological Sciences.

[17]  Thibaut Jombart,et al.  outbreaker2: Bayesian Reconstruction of Disease Outbreaks by Combining Epidemiologic and Genomic Data , 2018 .

[18]  Colin J. Worby,et al.  Within-Host Bacterial Diversity Hinders Accurate Reconstruction of Transmission Networks from Genomic Distance Data , 2014, PLoS Comput. Biol..

[19]  G. Dougan,et al.  Routine Use of Microbial Whole Genome Sequencing in Diagnostic and Public Health Microbiology , 2012, PLoS pathogens.

[20]  Katia Koelle,et al.  Transmission Bottleneck Size Estimation from Pathogen Deep-Sequencing Data, with an Application to Human Influenza A Virus , 2017, Journal of Virology.

[21]  A. Dawid The Well-Calibrated Bayesian , 1982 .

[22]  N. Taveira,et al.  Donor-Recipient Identification in Para- and Poly-phyletic Trees Under Alternative HIV-1 Transmission Hypotheses Using Approximate Bayesian Computation , 2017, Genetics.

[23]  Gaël Thébaud,et al.  Integrating genetic and epidemiological data to determine transmission pathways of foot-and-mouth disease virus , 2008, Proceedings of the Royal Society B: Biological Sciences.

[24]  Ethan Romero-Severson,et al.  Timing and order of transmission events is not directly reflected in a pathogen phylogeny. , 2014, Molecular biology and evolution.

[25]  M. Aldrin,et al.  Modelling the spread of infectious salmon anaemia among salmon farms based on seaway distances between farms and genetic relationships between infectious salmon anaemia virus isolates , 2011, Journal of The Royal Society Interface.

[26]  Tim E A Peto,et al.  Assessment of Mycobacterium tuberculosis transmission in Oxfordshire, UK, 2007–12, with whole pathogen genome sequences: an observational study , 2014, The Lancet. Respiratory medicine.

[27]  Daniel J. Wilson,et al.  Transforming clinical microbiology with bacterial genome sequencing , 2012, Nature Reviews Genetics.

[28]  G. McVean,et al.  Approximating the coalescent with recombination , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[29]  M. Suchard,et al.  Bayesian Phylogenetics with BEAUti and the BEAST 1.7 , 2012, Molecular biology and evolution.

[31]  Xavier Didelot,et al.  Bayesian Inference of Infectious Disease Transmission from Whole-Genome Sequence Data , 2014, Molecular biology and evolution.

[32]  Daniel J. Wilson,et al.  The Bacterial Sequential Markov Coalescent , 2016, Genetics.

[33]  M. Uhlén,et al.  Accurate reconstruction of a known HIV-1 transmission history by phylogenetic tree analysis. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Matthew Hall,et al.  Epidemic Reconstruction in a Phylogenetics Framework: Transmission Trees as Partitions of the Node Set , 2014, PLoS Comput. Biol..

[35]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[36]  T Jombart,et al.  Reconstructing disease outbreaks from genetic data: a graph approach , 2010, Heredity.

[37]  L. Excoffier,et al.  Robust Demographic Inference from Genomic and SNP Data , 2013, PLoS genetics.

[38]  Colin J. Worby,et al.  'SEEDY' (Simulation of Evolutionary and Epidemiological Dynamics): An R Package to Follow Accumulation of Within-Host Mutation in Pathogens , 2015, PloS one.

[39]  Andrew Rambaut,et al.  Evolutionary analysis of the dynamics of viral infectious disease , 2009, Nature Reviews Genetics.

[40]  Xavier Didelot,et al.  Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks , 2017, PLoS Comput. Biol..

[41]  Samuel Soubeyrand,et al.  A Bayesian Inference Framework to Reconstruct Transmission Trees Using Epidemiological and Genetic Data , 2012, PLoS Comput. Biol..

[42]  Erik M. Volz,et al.  Inferring the Source of Transmission with Phylogenetic Data , 2013, PLoS Comput. Biol..

[43]  V. Le,et al.  Selected insights from application of whole-genome sequencing for outbreak investigations , 2013, Current opinion in critical care.

[44]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[45]  Nicola De Maio,et al.  PoMo: An Allele Frequency-Based Approach for Species Tree Estimation , 2015, bioRxiv.

[46]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[47]  Daniel J. Wilson,et al.  Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study , 2013, The Lancet. Infectious diseases.

[48]  Jacco Wallinga,et al.  Relating Phylogenetic Trees to Transmission Trees of Infectious Disease Outbreaks , 2013, Genetics.

[49]  Nicola De Maio,et al.  Reversible polymorphism-aware phylogenetic models and their application to tree inference. , 2016, Journal of theoretical biology.

[50]  Nicola De Maio,et al.  SCOTTI: Efficient Reconstruction of Transmission within Outbreaks with the Structured Coalescent , 2016, PLoS Comput. Biol..

[51]  Xavier Didelot,et al.  Genomic Infectious Disease Epidemiology in Partially Sampled and Ongoing Outbreaks , 2016, bioRxiv.

[52]  Nicola De Maio,et al.  SimBac: simulation of whole bacterial genomes with homologous recombination , 2016, Microbial genomics.

[53]  T. Dallman,et al.  Performance comparison of benchtop high-throughput sequencing platforms , 2012, Nature Biotechnology.

[54]  Samuel Soubeyrand,et al.  A Bayesian approach for inferring the dynamics of partially observed endemic infectious diseases from space-time-genetic data , 2014, Proceedings of the Royal Society B: Biological Sciences.

[55]  Rachel S. G. Sealfon,et al.  Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak , 2014, Science.