Beyond the SNP threshold: identifying outbreak clusters using inferred transmissions

Whole genome sequencing (WGS) is increasingly used to aid the understanding of pathogen transmission. A first step in analysing WGS data is usually to define “transmission clusters”, sets of cases that are potentially linked by direct transmission. This is often done by including two cases in the same cluster if they are separated by fewer SNPs than a specified threshold. However, there is little agreement as to what an appropriate threshold should be. We propose a probabilistic alternative, suggesting that the key inferential target for transmission clusters is the number of transmissions separating cases. We characterise this by combining the number of SNP differences and the length of time over which those differences have accumulated, using information about case timing, molecular clock and transmission processes. Our framework has the advantage of allowing for variable mutation rates across the genome and can incorporate other epidemiological data. We use two tuberculosis studies to illustrate the impact our approach: with British Columbia data by using spatial divisions; with Republic of Moldova data by incorporating antibiotic resistance. Simulation results indicate that our transmission-based method is better at identifying direct transmissions than a SNP threshold, with dissimilarity between clusterings of on average 0.27 bits compared to 0.37 bits for the SNP threshold method and 0.84 bits for randomly permuted data. These results show that it is likely to outperform the SNP threshold where clock rates are variable and sample collection times are spread out. We implement the method in the R package transcluster.

[1]  VIMES , 2020, Proceedings of the 28th ACM International Conference on Multimedia.

[2]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[3]  이현주 Q. , 2005 .

[4]  C. Weißer F. , 2018, Industrial and Labor Relations Terms.

[5]  J. Gardy,et al.  Genotyping and Whole-Genome Sequencing to Identify Tuberculosis Transmission to Pediatric Patients in British Columbia, Canada, 2005–2014 , 2018, The Journal of infectious diseases.

[6]  Thibaut Jombart,et al.  When are pathogen genome sequences informative of transmission events? , 2018, PLoS pathogens.

[7]  G. Alagic,et al.  #p , 2019, Quantum Inf. Comput..

[8]  Qian Gao,et al.  Transmission of multidrug-resistant Mycobacterium tuberculosis in Shanghai, China: a retrospective observational study using whole-genome sequencing and epidemiological investigation. , 2017, The Lancet. Infectious diseases.

[9]  L. Jelsbak,et al.  Substantial molecular evolution and mutation rates in prolonged latent Mycobacterium tuberculosis infection in humans. , 2016, International journal of medical microbiology : IJMM.

[10]  Julian Parkhill,et al.  Whole Genome Sequence Analysis of a Large Isoniazid-Resistant Tuberculosis Outbreak in London: A Retrospective Observational Study , 2016, PLoS medicine.

[11]  J. Wallinga,et al.  Monitoring the spread of meticillin-resistant Staphylococcus aureus in The Netherlands from a reference laboratory perspective , 2016, The Journal of hospital infection.

[12]  Xavier Didelot,et al.  Genomic Infectious Disease Epidemiology in Partially Sampled and Ongoing Outbreaks , 2016, bioRxiv.

[13]  A. Poon Impacts and shortcomings of genetic clustering methods for infectious disease outbreaks , 2016, Virus evolution.

[14]  P. Ruutu,et al.  Whole genome analysis of Mycobacterium tuberculosis isolates from recurrent episodes of tuberculosis, Finland, 1995-2013. , 2016, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases.

[15]  A. Rambaut,et al.  Using genomics data to reconstruct transmission trees during disease outbreaks. , 2016, Revue scientifique et technique.

[16]  Joanne R. Winter,et al.  Interpreting whole genome sequencing for investigating tuberculosis transmission: a systematic review , 2016, BMC Medicine.

[17]  J. Morris,et al.  Genomic Epidemiology of Methicillin-Resistant Staphylococcus aureus in a Neonatal Intensive Care Unit , 2016, PloS one.

[18]  Phelim Bradley,et al.  Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis , 2015, Nature Communications.

[19]  M. Reed,et al.  Reemergence and amplification of tuberculosis in the Canadian arctic. , 2015, The Journal of infectious diseases.

[20]  F. Balloux,et al.  Four decades of transmission of a multidrug-resistant Mycobacterium tuberculosis outbreak strain , 2015, Nature Communications.

[21]  P. Beckert,et al.  PhyResSE: a Web Tool Delineating Mycobacterium tuberculosis Antibiotic Resistance and Lineage from Whole-Genome Sequencing Data , 2015, Journal of Clinical Microbiology.

[22]  Nalin Rastogi,et al.  Evolutionary history and global spread of the Mycobacterium tuberculosis Beijing lineage , 2015, Nature Genetics.

[23]  Mark M. Tanaka,et al.  Delineating Community Outbreaks of Salmonella enterica Serovar Typhimurium by Use of Whole-Genome Sequencing: Insights into Genomic Variability within an Outbreak , 2015, Journal of Clinical Microbiology.

[24]  T. Clark,et al.  Recurrence due to Relapse or Reinfection With Mycobacterium tuberculosis: A Whole-Genome Sequencing Approach in a Large, Population-Based Cohort With a High HIV Infection Prevalence and Active Follow-up , 2014, The Journal of infectious diseases.

[25]  Matthew Hall,et al.  Epidemic Reconstruction in a Phylogenetics Framework: Transmission Trees as Partitions of the Node Set , 2014, PLoS Comput. Biol..

[26]  P. Ashton,et al.  Applying phylogenomics to understand the emergence of Shiga Toxin producing Escherichia coli O157:H7 strains causing severe human disease in the United Kingdom , 2015 .

[27]  F. Balloux,et al.  Evolution of extensively drug-resistant Mycobacterium tuberculosis from a susceptible ancestor in a single patient , 2014, Genome Biology.

[28]  Martin Wiedmann,et al.  Omics approaches in food safety: fulfilling the promise? , 2014, Trends in microbiology.

[29]  Xavier Didelot,et al.  Bayesian Inference of Infectious Disease Transmission from Whole-Genome Sequence Data , 2014, Molecular biology and evolution.

[30]  V. Arcus,et al.  Whole Genome Sequencing of Mycobacterium tuberculosis Reveals Slow Growth and Low Mutation Rates during Latent Infections in Humans , 2014, PloS one.

[31]  Tim E A Peto,et al.  Assessment of Mycobacterium tuberculosis transmission in Oxfordshire, UK, 2007–12, with whole pathogen genome sequences: an observational study , 2014, The Lancet. Respiratory medicine.

[32]  Colin J. Worby,et al.  Within-Host Bacterial Diversity Hinders Accurate Reconstruction of Transmission Networks from Genomic Distance Data , 2014, PLoS Comput. Biol..

[33]  R. Gallager Stochastic Processes , 2014 .

[34]  Jukka Corander,et al.  Evolution and transmission of drug resistant tuberculosis in a Russian population , 2014, Nature Genetics.

[35]  Samuel A. Assefa,et al.  Elucidating Emergence and Transmission of Multidrug-Resistant Tuberculosis in Treatment Experienced Patients by Whole Genome Sequencing , 2013, PloS one.

[36]  Julian Parkhill,et al.  Whole-genome sequencing to establish relapse or re-infection with Mycobacterium tuberculosis: a retrospective observational study , 2013, The Lancet. Respiratory medicine.

[37]  Jeffrey E. Barrick,et al.  Genome dynamics during experimental evolution , 2013, Nature Reviews Genetics.

[38]  Jacco Wallinga,et al.  Finding Evidence for Local Transmission of Contagious Disease in Molecular Epidemiological Datasets , 2013, PloS one.

[39]  A. Bashir,et al.  Evolutionary Dynamics of Vibrio cholerae O1 following a Single-Source Introduction to Haiti , 2013, mBio.

[40]  Marc Lipsitch,et al.  Mycobacterium tuberculosis mutation rate estimates from different lineages predict substantial differences in the emergence of drug resistant tuberculosis , 2013, Nature Genetics.

[41]  A. Cameron,et al.  Regression Analysis of Count Data by A. Colin Cameron , 2013 .

[42]  J. Kammerer,et al.  Using statistical methods and genotyping to detect tuberculosis outbreaks , 2013, International Journal of Health Geographics.

[43]  Julian Parkhill,et al.  Inferring patient to patient transmission of Mycobacterium tuberculosis from whole genome sequencing data , 2013, BMC Infectious Diseases.

[44]  Daniel J. Wilson,et al.  Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study , 2013, The Lancet. Infectious diseases.

[45]  Stefan Niemann,et al.  Whole Genome Sequencing versus Traditional Genotyping for Investigation of a Mycobacterium tuberculosis Outbreak: A Longitudinal Molecular Epidemiological Study , 2013, PLoS medicine.

[46]  S. Mwaigwisya,et al.  Whole-genome sequencing to establish relapse or reinfection with Mycobacterium tuberculosis : a retrospective observational study , 2013 .

[47]  Odo Diekmann,et al.  Mathematical Tools for Understanding Infectious Disease Dynamics , 2012 .

[48]  崔玉军,et al.  genomic epidemiology , 2012 .

[49]  M. Chase,et al.  Use of whole genome sequencing to estimate the mutation rate of Mycobacterium tuberculosis during latent infection , 2011, Nature Genetics.

[50]  Klaus Peter Schliep,et al.  phangorn: phylogenetic analysis in R , 2010, Bioinform..

[51]  Pejman Rohani,et al.  Resolving the impact of waiting time distributions on the persistence of measles , 2010, Journal of The Royal Society Interface.

[52]  Howard Ochman,et al.  Inferring clocks when lacking rocks: the variable rates of molecular evolution in bacteria , 2009, Biology Direct.

[53]  S P Velsko,et al.  A Statistical Framework for Microbial Source Attribution , 2009 .

[54]  Ả. Svensson A note on generation times in epidemic models. , 2007, Mathematical Biosciences.

[55]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[56]  M. Lipsitch,et al.  How generation intervals shape the relationship between growth rates and reproductive numbers , 2007, Proceedings of the Royal Society B: Biological Sciences.

[57]  Mikhail S. Gelfand,et al.  Genome-Wide Molecular Clock and Horizontal Gene Transfer in Bacterial Evolution , 2004, Journal of bacteriology.

[58]  P. Fine The interval between successive cases of an infectious disease. , 2003, American journal of epidemiology.

[59]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[60]  Pravin K. Trivedi,et al.  Regression Analysis of Count Data , 1998 .

[61]  S. Ostroff,et al.  Transmission of Multidrug-Resistant Mycobacterium tuberculosis Among Persons Exposed in a Medical Examiner's Office, New York , 1995, Infection Control & Hospital Epidemiology.

[62]  Martin A. Garrett The LIGO Scientific Collaboration , 2010 .

[63]  Oliver Bendel [E] , 1896, Les noms officiels des communes de Wallonie, de Bruxelles-Capitale et de la communaute germanophone.