A Bayesian model of acquisition and clearance of bacterial colonization incorporating within-host variation

Bacterial populations that colonize a host can play important roles in host health, including serving as a reservoir that transmits to other hosts and from which invasive strains emerge, thus emphasizing the importance of understanding rates of acquisition and clearance of colonizing populations. Studies of colonization dynamics have been based on assessment of whether serial samples represent a single population or distinct colonization events. With the use of whole genome sequencing to determine genetic distance between isolates, a common solution to estimate acquisition and clearance rates has been to assume a fixed genetic distance threshold below which isolates are considered to represent the same strain. However, this approach is often inadequate to account for the diversity of the underlying within-host evolving population, the time intervals between consecutive measurements, and the uncertainty in the estimated acquisition and clearance rates. Here, we present a fully Bayesian model that provides probabilities of whether two strains should be considered the same, allowing us to determine bacterial clearance and acquisition from genomes sampled over time. Our method explicitly models the within-host variation using population genetic simulation, and the inference is done using a combination of Approximate Bayesian Computation (ABC) and Markov Chain Monte Carlo (MCMC). We validate the method with multiple carefully conducted simulations and demonstrate its use in practice by analyzing a collection of methicillin resistant Staphylococcus aureus (MRSA) isolates from a large recently completed longitudinal clinical study. An R-code implementation of the method is freely available at: https://github.com/mjarvenpaa/bacterial-colonization-model.

[1]  J. Gardy,et al.  Beyond the SNP threshold: identifying outbreak clusters using inferred transmissions , 2018, bioRxiv.

[2]  David Goldblatt,et al.  Genome-wide identification of lineage and locus specific variation associated with pneumococcal carriage duration , 2017, bioRxiv.

[3]  Timothy Foster,et al.  Key Role for Clumping Factor B in Staphylococcus aureus Nasal Colonization of Humans , 2008, PLoS medicine.

[4]  Jacqueline A. Keane,et al.  Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins , 2014, Nucleic acids research.

[5]  Pekka Marttinen,et al.  A Bayesian model of acquisition and clearance of bacterial colonization , 2018, ArXiv.

[6]  David B. Dunson,et al.  Bayesian data analysis, third edition , 2013 .

[7]  Marcel H. Schulz,et al.  Research in Computational Molecular Biology , 2018, Lecture Notes in Computer Science.

[8]  Carsten Wiuf,et al.  The Coalescent of Bacterial Populations , 2010 .

[9]  Richard G. Everitt,et al.  Within-Host Evolution of Staphylococcus aureus during Asymptomatic Carriage , 2013, PloS one.

[10]  Julian Parkhill,et al.  Molecular tracing of the emergence, diversification, and transmission of S. aureus sequence type 8 in a New York community , 2014, Proceedings of the National Academy of Sciences.

[11]  Daniel J. Wilson,et al.  The Bacterial Sequential Markov Coalescent , 2016, Genetics.

[12]  Aki Vehtari,et al.  Gaussian process modelling in approximate Bayesian computation to estimate horizontal gene transfer in bacteria , 2016, The Annals of Applied Statistics.

[13]  M. Gutmann,et al.  Fundamentals and Recent Developments in Approximate Bayesian Computation , 2016, Systematic biology.

[14]  Dmitry Antipov,et al.  Assembling Genomes and Mini-metagenomes from Highly Chimeric Reads , 2013, RECOMB.

[15]  Daniel J. Wilson,et al.  Whole-Genome Sequencing Reveals the Contribution of Long-Term Carriers in Staphylococcus aureus Outbreak Investigation , 2017, Journal of Clinical Microbiology.

[16]  Andreas Huth,et al.  Statistical inference for stochastic simulation models--theory and application. , 2011, Ecology letters.

[17]  Colin J. Worby,et al.  Within-Host Bacterial Diversity Hinders Accurate Reconstruction of Transmission Networks from Genomic Distance Data , 2014, PLoS Comput. Biol..

[18]  Xavier Didelot,et al.  Inference of the Properties of the Recombination Process from Whole Bacterial Genomes , 2013, Genetics.

[19]  M. Calderwood,et al.  Duration of Colonization With Methicillin-Resistant Staphylococcus aureus: A Question With Many Answers , 2015 .

[20]  Peter Donnelly,et al.  Evolutionary dynamics of Staphylococcus aureus during progression from carriage to disease , 2012, Proceedings of the National Academy of Sciences.

[21]  Julian Parkhill,et al.  Changing the paradigm for hospital outbreak detection by leading with genomic surveillance of nosocomial pathogens , 2018, Microbiology.

[22]  Zamin Iqbal,et al.  Severe infections emerge from commensal bacteria by adaptive evolution , 2017, eLife.

[23]  Loren G. Miller,et al.  Transmission and Microevolution of USA300 MRSA in U.S. Households: Evidence from Whole-Genome Sequencing , 2015, mBio.

[24]  Francesc Coll,et al.  Longitudinal genomic surveillance of MRSA in the UK reveals transmission patterns in hospitals and the community , 2017, Science Translational Medicine.

[25]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[26]  Jean-Michel Marin,et al.  Approximate Bayesian computational methods , 2011, Statistics and Computing.

[27]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[28]  Daniel J. Wilson,et al.  Within-host evolution of bacterial pathogens , 2016, Nature Reviews Microbiology.

[29]  Jukka Corander,et al.  Efficient Inference of Recent and Ancestral Recombination within Bacterial Populations , 2016, bioRxiv.

[30]  Derrick W. Crook,et al.  Dynamics of acquisition and loss of carriage of Staphylococcus aureus strains in the community: The effect of clonal complex☆☆☆ , 2014, The Journal of infection.

[31]  Daniel J. Wilson,et al.  ClonalFrameML: Efficient Inference of Recombination in Whole Bacterial Genomes , 2015, PLoS Comput. Biol..

[32]  Mats Gyllenberg,et al.  Estimating the Transmission Dynamics of Streptococcus pneumoniae from Strain Prevalence Data , 2013, Biometrics.

[33]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[34]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[35]  Julie Ratcliffe,et al.  A Think Aloud Study Comparing the Validity and Acceptability of Discrete Choice and Best Worst Scaling Methods , 2014, PloS one.

[36]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[37]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[38]  John Paul,et al.  Transmission of Staphylococcus aureus between health-care workers, the environment, and patients in an intensive care unit: a longitudinal cohort study based on whole-genome sequencing , 2017, The Lancet. Infectious diseases.

[39]  Ted Cohen,et al.  Beyond the SNP threshold: identifying outbreak clusters using inferred transmissions , 2018, bioRxiv.

[40]  Daniel Falush,et al.  Bacterial Population Genetics in Infectious Disease , 1994 .