Beyond the SNP threshold: identifying outbreak clusters using inferred transmissions

Whole-genome sequencing (WGS) is increasingly used to aid the understanding of pathogen transmission. A first step in analyzing WGS data is usually to define "transmission clusters," sets of cases that are potentially linked by direct transmission. This is often done by including two cases in the same cluster if they are separated by fewer single-nucleotide polymorphisms (SNPs) than a specified threshold. However, there is little agreement as to what an appropriate threshold should be. We propose a probabilistic alternative, suggesting that the key inferential target for transmission clusters is the number of transmissions separating cases. We characterize this by combining the number of SNP differences and the length of time over which those differences have accumulated, using information about case timing, molecular clock, and transmission processes. Our framework has the advantage of allowing for variable mutation rates across the genome and can incorporate other epidemiological data. We use two tuberculosis studies to illustrate the impact of our approach: with British Columbia data by using spatial divisions; with Republic of Moldova data by incorporating antibiotic resistance. Simulation results indicate that our transmission-based method is better in identifying direct transmissions than a SNP threshold, with dissimilarity between clusterings of on average 0.27 bits compared with 0.37 bits for the SNP-threshold method and 0.84 bits for randomly permuted data. These results show that it is likely to outperform the SNP-threshold method where clock rates are variable and sample collection times are spread out. We implement the method in the R package transcluster.

[1]  T. Clark,et al.  Recurrence due to Relapse or Reinfection With Mycobacterium tuberculosis: A Whole-Genome Sequencing Approach in a Large, Population-Based Cohort With a High HIV Infection Prevalence and Active Follow-up , 2014, The Journal of infectious diseases.

[2]  Qian Gao,et al.  Transmission of multidrug-resistant Mycobacterium tuberculosis in Shanghai, China: a retrospective observational study using whole-genome sequencing and epidemiological investigation. , 2017, The Lancet. Infectious diseases.

[3]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[4]  R. Gallager Stochastic Processes , 2014 .

[5]  Joanne R. Winter,et al.  Interpreting whole genome sequencing for investigating tuberculosis transmission: a systematic review , 2016, BMC Medicine.

[6]  Xavier Didelot,et al.  Bayesian Inference of Infectious Disease Transmission from Whole-Genome Sequence Data , 2014, Molecular biology and evolution.

[7]  M. Lipsitch,et al.  How generation intervals shape the relationship between growth rates and reproductive numbers , 2007, Proceedings of the Royal Society B: Biological Sciences.

[8]  Pravin K. Trivedi,et al.  Regression Analysis of Count Data , 1998 .

[9]  V. Arcus,et al.  Whole Genome Sequencing of Mycobacterium tuberculosis Reveals Slow Growth and Low Mutation Rates during Latent Infections in Humans , 2014, PloS one.

[10]  Matthew Hall,et al.  Epidemic Reconstruction in a Phylogenetics Framework: Transmission Trees as Partitions of the Node Set , 2014, PLoS Comput. Biol..

[11]  Phelim Bradley,et al.  Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis , 2015, Nature Communications.

[12]  Colin J. Worby,et al.  Within-Host Bacterial Diversity Hinders Accurate Reconstruction of Transmission Networks from Genomic Distance Data , 2014, PLoS Comput. Biol..

[13]  J. Morris,et al.  Genomic Epidemiology of Methicillin-Resistant Staphylococcus aureus in a Neonatal Intensive Care Unit , 2016, PloS one.

[14]  P. Beckert,et al.  PhyResSE: a Web Tool Delineating Mycobacterium tuberculosis Antibiotic Resistance and Lineage from Whole-Genome Sequencing Data , 2015, Journal of Clinical Microbiology.

[15]  Xavier Didelot,et al.  Genomic Infectious Disease Epidemiology in Partially Sampled and Ongoing Outbreaks , 2016, bioRxiv.

[16]  Jukka Corander,et al.  Evolution and transmission of drug resistant tuberculosis in a Russian population , 2014, Nature Genetics.

[17]  Julian Parkhill,et al.  Whole Genome Sequence Analysis of a Large Isoniazid-Resistant Tuberculosis Outbreak in London: A Retrospective Observational Study , 2016, PLoS medicine.

[18]  A. Bashir,et al.  Evolutionary Dynamics of Vibrio cholerae O1 following a Single-Source Introduction to Haiti , 2013, mBio.

[19]  K. Castro,et al.  Transmission of multidrug-resistant Mycobacterium tuberculosis during a long airplane flight. , 1996, The New England journal of medicine.

[20]  Julian Parkhill,et al.  Whole-genome sequencing to establish relapse or re-infection with Mycobacterium tuberculosis: a retrospective observational study , 2013, The Lancet. Respiratory medicine.

[21]  J. Kammerer,et al.  Using statistical methods and genotyping to detect tuberculosis outbreaks , 2013, International Journal of Health Geographics.

[22]  Howard Ochman,et al.  Inferring clocks when lacking rocks: the variable rates of molecular evolution in bacteria , 2009, Biology Direct.

[23]  A. Poon Impacts and shortcomings of genetic clustering methods for infectious disease outbreaks , 2016, Virus evolution.

[24]  Jeffrey E. Barrick,et al.  Genome dynamics during experimental evolution , 2013, Nature Reviews Genetics.

[25]  P. Ruutu,et al.  Whole genome analysis of Mycobacterium tuberculosis isolates from recurrent episodes of tuberculosis, Finland, 1995-2013. , 2016, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases.

[26]  VIMES , 2020, Proceedings of the 28th ACM International Conference on Multimedia.

[27]  J. Gardy,et al.  Genotyping and Whole-Genome Sequencing to Identify Tuberculosis Transmission to Pediatric Patients in British Columbia, Canada, 2005–2014 , 2018, The Journal of infectious diseases.

[28]  Nalin Rastogi,et al.  Evolutionary history and global spread of the Mycobacterium tuberculosis Beijing lineage , 2015, Nature Genetics.

[29]  S. Mwaigwisya,et al.  Whole-genome sequencing to establish relapse or reinfection with Mycobacterium tuberculosis : a retrospective observational study , 2013 .

[30]  P. Ashton,et al.  Applying phylogenomics to understand the emergence of Shiga-toxin-producing Escherichia coli O157:H7 strains causing severe human disease in the UK , 2015, Microbial genomics.

[31]  Julian Parkhill,et al.  Inferring patient to patient transmission of Mycobacterium tuberculosis from whole genome sequencing data , 2013, BMC Infectious Diseases.

[32]  Marc Lipsitch,et al.  Mycobacterium tuberculosis mutation rate estimates from different lineages predict substantial differences in the emergence of drug resistant tuberculosis , 2013, Nature Genetics.

[33]  F. Balloux,et al.  Four decades of transmission of a multidrug-resistant Mycobacterium tuberculosis outbreak strain , 2015, Nature Communications.

[34]  Daniel J. Wilson,et al.  Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study , 2013, The Lancet. Infectious diseases.

[35]  L. Jelsbak,et al.  Substantial molecular evolution and mutation rates in prolonged latent Mycobacterium tuberculosis infection in humans. , 2016, International journal of medical microbiology : IJMM.

[36]  Martin Wiedmann,et al.  Omics approaches in food safety: fulfilling the promise? , 2014, Trends in microbiology.

[37]  Mark M. Tanaka,et al.  Delineating Community Outbreaks of Salmonella enterica Serovar Typhimurium by Use of Whole-Genome Sequencing: Insights into Genomic Variability within an Outbreak , 2015, Journal of Clinical Microbiology.

[38]  A. Rambaut,et al.  Using genomics data to reconstruct transmission trees during disease outbreaks. , 2016, Revue scientifique et technique.

[39]  Samuel A. Assefa,et al.  Elucidating Emergence and Transmission of Multidrug-Resistant Tuberculosis in Treatment Experienced Patients by Whole Genome Sequencing , 2013, PloS one.

[40]  崔玉军,et al.  genomic epidemiology , 2012 .

[41]  Jacco Wallinga,et al.  Finding Evidence for Local Transmission of Contagious Disease in Molecular Epidemiological Datasets , 2013, PloS one.

[42]  Klaus Peter Schliep,et al.  phangorn: phylogenetic analysis in R , 2010, Bioinform..

[43]  J. Wallinga,et al.  Monitoring the spread of meticillin-resistant Staphylococcus aureus in The Netherlands from a reference laboratory perspective , 2016, The Journal of hospital infection.

[44]  Pejman Rohani,et al.  Resolving the impact of waiting time distributions on the persistence of measles , 2010, Journal of The Royal Society Interface.

[45]  A. Cameron,et al.  Regression Analysis of Count Data by A. Colin Cameron , 2013 .

[46]  M. Chase,et al.  Use of whole genome sequencing to estimate the mutation rate of Mycobacterium tuberculosis during latent infection , 2011, Nature Genetics.

[47]  P. Fine The interval between successive cases of an infectious disease. , 2003, American journal of epidemiology.

[48]  F. Balloux,et al.  Evolution of extensively drug-resistant Mycobacterium tuberculosis from a susceptible ancestor in a single patient , 2014, Genome Biology.

[49]  Mikhail S. Gelfand,et al.  Genome-Wide Molecular Clock and Horizontal Gene Transfer in Bacterial Evolution , 2004, Journal of bacteriology.

[50]  Stefan Niemann,et al.  Whole Genome Sequencing versus Traditional Genotyping for Investigation of a Mycobacterium tuberculosis Outbreak: A Longitudinal Molecular Epidemiological Study , 2013, PLoS medicine.

[51]  Tim E A Peto,et al.  Assessment of Mycobacterium tuberculosis transmission in Oxfordshire, UK, 2007–12, with whole pathogen genome sequences: an observational study , 2014, The Lancet. Respiratory medicine.

[52]  M. Reed,et al.  Reemergence and amplification of tuberculosis in the Canadian arctic. , 2015, The Journal of infectious diseases.

[53]  Thibaut Jombart,et al.  When are pathogen genome sequences informative of transmission events? , 2018, PLoS pathogens.