Estimating the relative probability of direct transmission between infectious disease patients.

BACKGROUND Estimating infectious disease parameters such as the serial interval (time between symptom onset in primary and secondary cases) and reproductive number (average number of secondary cases produced by a primary case) are important in understanding infectious disease dynamics. Many estimation methods require linking cases by direct transmission, a difficult task for most diseases. METHODS Using a subset of cases with detailed genetic and/or contact investigation data to develop a training set of probable transmission events, we build a model to estimate the relative transmission probability for all case-pairs from demographic, spatial and clinical data. Our method is based on naive Bayes, a machine learning classification algorithm which uses the observed frequencies in the training dataset to estimate the probability that a pair is linked given a set of covariates. RESULTS In simulations, we find that the probabilities estimated using genetic distance between cases to define training transmission events are able to distinguish between truly linked and unlinked pairs with high accuracy (area under the receiver operating curve value of 95%). Additionally, only a subset of the cases, 10-50% depending on sample size, need to have detailed genetic data for our method to perform well. We show how these probabilities can be used to estimate the average effective reproductive number and apply our method to a tuberculosis outbreak in Hamburg, Germany. CONCLUSIONS Our method is a novel way to infer transmission dynamics in any dataset when only a subset of cases has rich contact investigation and/or genetic data.

[1]  Theodore Kypraios,et al.  Reconstructing transmission trees for communicable diseases using densely sampled genetic data. , 2014, The annals of applied statistics.

[2]  Julian Parkhill,et al.  A Shared Population of Epidemic Methicillin-Resistant Staphylococcus aureus 15 Circulates in Humans and Companion Animals , 2014, mBio.

[3]  C. Sreeramareddy,et al.  Time delays in diagnosis of pulmonary tuberculosis: a systematic review of literature , 2009, BMC infectious diseases.

[4]  Mirjam Kretzschmar,et al.  Infectious disease transmission as a forensic problem: who infected whom? , 2013, Journal of The Royal Society Interface.

[5]  C. Fraser,et al.  Transmission Dynamics of the Etiological Agent of SARS in Hong Kong: Impact of Public Health Interventions , 2003, Science.

[6]  A. Schuchat,et al.  Superspreading SARS Events, Beijing, 2003 , 2004, Emerging infectious diseases.

[7]  Tim E A Peto,et al.  Assessment of Mycobacterium tuberculosis transmission in Oxfordshire, UK, 2007–12, with whole pathogen genome sequences: an observational study , 2014, The Lancet. Respiratory medicine.

[8]  Julian Parkhill,et al.  Whole-genome sequencing to establish relapse or re-infection with Mycobacterium tuberculosis: a retrospective observational study , 2013, The Lancet. Respiratory medicine.

[9]  J. Kammerer,et al.  Recent Transmission of Tuberculosis — United States, 2011–2014 , 2016, PloS one.

[10]  Xavier Didelot,et al.  Genomic Infectious Disease Epidemiology in Partially Sampled and Ongoing Outbreaks , 2016, bioRxiv.

[11]  L Forsberg White,et al.  A likelihood‐based method for real‐time estimation of the serial interval and reproductive number of an epidemic , 2008, Statistics in medicine.

[12]  Isaac Chun-Hai Fung,et al.  Cholera transmission dynamic models for public health practitioners , 2014, Emerging Themes in Epidemiology.

[13]  Benjamin Armbruster,et al.  Contact tracing to control infectious disease: when enough is enough , 2007, Health care management science.

[14]  D. Boomsma,et al.  Regular Exercise, Subjective Wellbeing, and Internalizing Problems in Adolescence: Causality or Genetic Pleiotropy? , 2012, Front. Gene..

[15]  Xuesong Yan,et al.  Survey of Improving Naive Bayes for Classification , 2007, ADMA.

[16]  Robyn S Lee,et al.  Population genomics of Mycobacterium tuberculosis in the Inuit , 2015, Proceedings of the National Academy of Sciences.

[17]  J. Hyman,et al.  The basic reproductive number of Ebola and the effects of public health measures: the cases of Congo and Uganda. , 2004, Journal of theoretical biology.

[18]  Gaël Thébaud,et al.  Integrating genetic and epidemiological data to determine transmission pathways of foot-and-mouth disease virus , 2008, Proceedings of the Royal Society B: Biological Sciences.

[19]  Samuel Soubeyrand,et al.  A Bayesian Inference Framework to Reconstruct Transmission Trees Using Epidemiological and Genetic Data , 2012, PLoS Comput. Biol..

[20]  J. Wallinga,et al.  Serial intervals of respiratory infectious diseases: a systematic review and analysis. , 2014, American journal of epidemiology.

[21]  Timothy Brown,et al.  Transmission of multidrug-resistant tuberculosis in the UK: a cross-sectional molecular and epidemiological study of clustering and contact tracing. , 2014, The Lancet. Infectious diseases.

[22]  Simon Cauchemez,et al.  Chains of transmission and control of Ebola virus disease in Conakry, Guinea, in 2014: an observational study. , 2015, The Lancet. Infectious diseases.

[23]  Xavier Didelot,et al.  Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks , 2017, PLoS Comput. Biol..

[24]  I. Kiss,et al.  Disease contact tracing in random and clustered networks , 2005, Proceedings of the Royal Society B: Biological Sciences.

[25]  Daniel J. Wilson,et al.  Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study , 2013, The Lancet. Infectious diseases.

[26]  G. Bjune,et al.  A systematic review of delay in the diagnosis and treatment of tuberculosis , 2008, BMC public health.

[27]  Thibaut Jombart,et al.  Bayesian inference of transmission chains using timing of symptoms, pathogen genomes and contact data , 2019, PLoS Comput. Biol..

[28]  D van Soolingen,et al.  Estimation of serial interval and incubation period of tuberculosis using DNA fingerprinting. , 1999, The international journal of tuberculosis and lung disease : the official journal of the International Union against Tuberculosis and Lung Disease.

[29]  Marc Lipsitch,et al.  Shared Genomic Variants: Identification of Transmission Routes Using Pathogen Deep-Sequence Data , 2017, American journal of epidemiology.

[30]  Trevor Bedford,et al.  Virus genomes reveal factors that spread and sustained the Ebola epidemic , 2017, Nature.

[31]  Paola Sebastiani,et al.  Naïve Bayesian Classifier and Genetic Risk Score for Genetic Risk Prediction of a Categorical Trait: Not so Different after all! , 2012, Front. Gene..

[32]  Shasha Wang,et al.  Deep feature weighting for naive Bayes and its application to text classification , 2016, Eng. Appl. Artif. Intell..

[33]  Paolo Piazza,et al.  Microevolutionary analysis of Clostridium difficile genomes to investigate transmission , 2012, Genome Biology.

[34]  E. Lyons,et al.  Pandemic Potential of a Strain of Influenza A (H1N1): Early Findings , 2009, Science.

[35]  Ludmila I. Kuncheva,et al.  On the optimality of Naïve Bayes with dependent binary features , 2006, Pattern Recognit. Lett..

[36]  Ronald B Geskus,et al.  The incubation period distribution of tuberculosis estimated with a molecular epidemiological approach. , 2011, International journal of epidemiology.

[37]  Thibaut Jombart,et al.  outbreaker2: Bayesian Reconstruction of Disease Outbreaks by Combining Epidemiologic and Genomic Data , 2018 .

[38]  Randall J. Olsen,et al.  Absence of Patient-to-Patient Intrahospital Transmission of Staphylococcus aureus as Determined by Whole-Genome Sequencing , 2014, mBio.

[39]  Simon Cauchemez,et al.  Serial intervals and the temporal distribution of secondary infections within households of 2009 pandemic influenza A (H1N1): implications for influenza control recommendations. , 2011, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[40]  Timothy F. Leslie,et al.  Complexity of the Basic Reproduction Number (R0) , 2019, Emerging infectious diseases.

[41]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[42]  Geoffrey I. Webb,et al.  Alleviating naive Bayes attribute independence assumption by attribute weighting , 2013, J. Mach. Learn. Res..

[43]  Mohammed El Amine Bechar,et al.  Statistical Comparisons of the Top 10 Algorithms in Data Mining for Classification Task , 2016 .

[44]  Séverine Ansart,et al.  Transmission parameters of the A/H1N1 (2009) influenza virus pandemic: a review , 2011, Influenza and other respiratory viruses.

[45]  Stefan Niemann,et al.  Assessment of an Optimized Mycobacterial Interspersed Repetitive- Unit-Variable-Number Tandem-Repeat Typing System Combined with Spoligotyping for Population-Based Molecular Epidemiology Studies of Tuberculosis , 2006, Journal of Clinical Microbiology.

[46]  J. Potterat,et al.  Partner notification for sexually transmitted infections in the modern world: a practitioner perspective on challenges and opportunities , 2011, Sexually Transmitted Infections.

[47]  P E Fine,et al.  Lifetime risks, incubation period, and serial interval of tuberculosis. , 2000, American journal of epidemiology.

[48]  Stefan Niemann,et al.  Whole Genome Sequencing versus Traditional Genotyping for Investigation of a Mycobacterium tuberculosis Outbreak: A Longitudinal Molecular Epidemiological Study , 2013, PLoS medicine.

[49]  Ayse Basar Bener,et al.  Analysis of Naive Bayes' assumptions on software fault data: An empirical study , 2009, Data Knowl. Eng..

[50]  Juliana Grant,et al.  A field-validated approach using surveillance and genotyping data to estimate tuberculosis attributable to recent transmission in the United States. , 2015, American journal of epidemiology.

[51]  S. Niemann,et al.  Epidemiology of Tuberculosis in Hamburg, Germany: Long-Term Population-Based Analysis Applying Classical and Molecular Epidemiological Techniques , 2002, Journal of Clinical Microbiology.

[52]  L. F. White,et al.  Quantifying TB transmission: a systematic review of reproduction number and serial interval estimates for tuberculosis , 2018, Epidemiology and Infection.

[53]  Stefan Niemann,et al.  Risk of tuberculosis transmission among healthcare workers , 2018, ERJ Open Research.

[54]  S. Ribeiro,et al.  T-SPOT.TB responses during treatment of pulmonary tuberculosis , 2009, BMC infectious diseases.

[55]  Thibaut Jombart,et al.  A graph-based evidence synthesis approach to detecting outbreak clusters: An application to dog rabies , 2018, PLoS Comput. Biol..

[56]  Marcello Pagano,et al.  Determining the dynamics of influenza transmission by age , 2014, Emerging Themes in Epidemiology.

[57]  J Wallinga,et al.  Unravelling transmission trees of infectious diseases by combining genetic and epidemiological data , 2012, Proceedings of the Royal Society B: Biological Sciences.

[58]  J. Wallinga,et al.  Different Epidemic Curves for Severe Acute Respiratory Syndrome Reveal Similar Impacts of Control Measures , 2004, American journal of epidemiology.

[59]  Ted Cohen,et al.  Beyond the SNP threshold: identifying outbreak clusters using inferred transmissions , 2018, bioRxiv.

[60]  Charles W Warren,et al.  Prevalence of smoking and other smoking related behaviors reported by the Global Youth Tobacco Survey (GYTS) in four Peruvian cities , 2008, BMC public health.

[61]  Ömer Faruk Arar,et al.  A feature dependent Naive Bayes approach and its application to the software defect prediction problem , 2017, Appl. Soft Comput..

[62]  T Jombart,et al.  Reconstructing disease outbreaks from genetic data: a graph approach , 2010, Heredity.

[63]  Stefan Niemann,et al.  Genotyping of Genetically Monomorphic Bacteria: DNA Sequencing in Mycobacterium tuberculosis Highlights the Limitations of Current Methodologies , 2009, PloS one.

[64]  R. Chaisson,et al.  Transmission of Mycobacterium tuberculosis through casual contact with an infectious case. , 2001, Archives of internal medicine.

[65]  Ellen Brooks-Pollock,et al.  Epidemiologic inference from the distribution of tuberculosis cases in households in Lima, Peru. , 2011, The Journal of infectious diseases.