A graph-based evidence synthesis approach to detecting outbreak clusters: An application to dog rabies

Early assessment of infectious disease outbreaks is key to implementing timely and effective control measures. In particular, rapidly recognising whether infected individuals stem from a single outbreak sustained by local transmission, or from repeated introductions, is crucial to adopt effective interventions. In this study, we introduce a new framework for combining several data streams, e.g. temporal, spatial and genetic data, to identify clusters of related cases of an infectious disease. Our method explicitly accounts for underreporting, and allows incorporating preexisting information about the disease, such as its serial interval, spatial kernel, and mutation rate. We define, for each data stream, a graph connecting all cases, with edges weighted by the corresponding pairwise distance between cases. Each graph is then pruned by removing distances greater than a given cutoff, defined based on preexisting information on the disease and assumptions on the reporting rate. The pruned graphs corresponding to different data streams are then merged by intersection to combine all data types; connected components define clusters of cases related for all types of data. Estimates of the reproduction number (the average number of secondary cases infected by an infectious individual in a large population), and the rate of importation of the disease into the population, are also derived. We test our approach on simulated data and illustrate it using data on dog rabies in Central African Republic. We show that the outbreak clusters identified using our method are consistent with structures previously identified by more complex, computationally intensive approaches.

[1]  Olivier Delmas,et al.  Genomic Diversity and Evolution of the Lyssaviruses , 2008, PloS one.

[2]  M. Kulldorff,et al.  A Space–Time Permutation Scan Statistic for Disease Outbreak Detection , 2005, PLoS medicine.

[3]  E. Bogardus,et al.  Measurement of personal-group relations. , 1947 .

[4]  A. Osterhaus,et al.  For Personal Use. Only Reproduce with Permission from the Lancet , 2022 .

[5]  Xavier Didelot,et al.  Bayesian Inference of Infectious Disease Transmission from Whole-Genome Sequence Data , 2014, Molecular biology and evolution.

[6]  W. Hanage,et al.  eBURST: Inferring Patterns of Evolutionary Descent among Clusters of Related Bacterial Genotypes from Multilocus Sequence Typing Data , 2004, Journal of bacteriology.

[7]  Yevgeniy Elbert,et al.  Development and evaluation of a data-adaptive alerting algorithm for univariate temporal biosurveillance data. , 2009, Statistics in medicine.

[8]  Marion Koopmans,et al.  Genetic data provide evidence for wind-mediated transmission of highly pathogenic avian influenza. , 2013, The Journal of infectious diseases.

[9]  Michael Höhle,et al.  surveillance: An R package for the monitoring of infectious diseases , 2007, Comput. Stat..

[10]  L. Hutwagner,et al.  Using laboratory-based surveillance data for prevention: an algorithm for detecting Salmonella outbreaks. , 1997, Emerging infectious diseases.

[11]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.

[12]  Zhian N. Kamvar,et al.  Poppr: an R package for genetic analysis of populations with clonal, partially clonal, and/or sexual reproduction , 2014, PeerJ.

[13]  Christl A. Donnelly,et al.  Revealing the Micro-scale Signature of Endemic Zoonotic Disease Transmission in an African Urban Setting , 2016, PLoS pathogens.

[14]  Craig Packer,et al.  Transmission Dynamics and Prospects for the Elimination of Canine Rabies , 2009, PLoS biology.

[15]  A. L. Le Faou,et al.  New introduction and spread of rabies among dog population in Bangui. , 2012, Acta tropica.

[16]  Robert H. Shumway,et al.  Time series analysis and its applications : with R examples , 2017 .

[17]  J. Dushoff,et al.  Estimating the Global Burden of Endemic Canine Rabies , 2015, PLoS neglected tropical diseases.

[18]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[19]  Daniel Falush,et al.  Bacterial Population Genetics in Infectious Disease , 1994 .

[20]  Toshiro Tango,et al.  International Journal of Health Geographics a Flexibly Shaped Space-time Scan Statistic for Disease Outbreak Detection and Monitoring , 2022 .

[21]  R. Serfling Methods for current statistical analysis of excess pneumonia-influenza deaths. , 1963, Public health reports.

[22]  Paul H. Garthwaite,et al.  Statistical methods for the prospective detection of infectious disease outbreaks: a review , 2012 .

[23]  D F Stroup,et al.  Detection of aberrations in the occurrence of notifiable diseases surveillance data. , 1989, Statistics in medicine.

[24]  Chris T Bauch,et al.  Assessing the pandemic potential of MERS-CoV , 2013, The Lancet.

[25]  S Cauchemez,et al.  Transmission scenarios for Middle East Respiratory Syndrome Coronavirus (MERS-CoV) and how to tell them apart. , 2013, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[26]  Samuel Soubeyrand,et al.  OutbreakTools: A new platform for disease outbreak analysis using the R software , 2014, Epidemics.

[27]  Thibaut Jombart,et al.  When are pathogen genome sequences informative of transmission events? , 2018, PLoS pathogens.

[28]  Wes Hinsley,et al.  West African Ebola epidemic after one year--slowing but not yet under control. , 2015, The New England journal of medicine.

[29]  J. Goudet HIERFSTAT , a package for R to compute and test hierarchical F -statistics , 2005 .

[30]  Emmanuel Paradis,et al.  pegas: an R package for population genetics with an integrated-modular approach , 2010, Bioinform..

[31]  Beth Ann Griffin,et al.  Early detection of influenza outbreaks using the DC Department of Health's syndromic surveillance system , 2009, BMC public health.

[32]  Thibaut Jombart,et al.  Bioinformatics Applications Note Phylogenetics Adephylo: New Tools for Investigating the Phylogenetic Signal in Biological Traits , 2022 .

[33]  Marco Vignuzzi,et al.  Large-Scale Phylogenomic Analysis Reveals the Complex Evolutionary History of Rabies Virus in Multiple Carnivore Hosts , 2016, PLoS pathogens.

[34]  Thibaut Jombart,et al.  A high-resolution genomic analysis of multidrug-resistant hospital outbreaks of Klebsiella pneumoniae , 2015, EMBO molecular medicine.

[35]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[36]  F. Balloux,et al.  Discriminant analysis of principal components: a new method for the analysis of genetically structured populations , 2010, BMC Genetics.

[37]  P. E. Kopp,et al.  Superspreading and the effect of individual variation on disease emergence , 2005, Nature.

[38]  Thibaut Jombart,et al.  outbreaker2: Bayesian Reconstruction of Disease Outbreaks by Combining Epidemiologic and Genomic Data , 2018 .

[39]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[40]  Galit Shmueli,et al.  Automated time series forecasting for biosurveillance , 2007, Statistics in medicine.

[41]  Howard S. Burkom,et al.  Statistical Challenges Facing Early Outbreak Detection in Biosurveillance , 2010, Technometrics.

[42]  Hilmar Lapp,et al.  apex: phylogenetics with multiple genes , 2016, Molecular ecology resources.

[43]  Julian Parkhill,et al.  Whole-genome sequencing for analysis of an outbreak of meticillin-resistant Staphylococcus aureus: a descriptive study , 2013, The Lancet. Infectious Diseases.

[44]  Thibaut Jombart,et al.  A fast likelihood solution to the genetic clustering problem , 2018, Methods in ecology and evolution.

[45]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[46]  David L. Buckeridge,et al.  Outbreak detection through automated surveillance: A review of the determinants of detection , 2007, J. Biomed. Informatics.

[47]  Camille Pelat,et al.  Online detection and quantification of epidemics , 2007, BMC Medical Informatics Decis. Mak..

[48]  Christl A. Donnelly,et al.  The Contribution of Badgers to Confirmed Tuberculosis in Cattle in High-Incidence Areas in England , 2013, PLoS currents.

[49]  Ả. Svensson A note on generation times in epidemic models. , 2007, Mathematical biosciences.

[50]  Stephen W. Martin,et al.  2016: the beginning of the end of rabies? , 2016, The Lancet. Global health.

[51]  W. Team,et al.  West African Ebola Epidemic after One Year — Slowing but Not Yet under Control , 2015 .

[52]  Andrew W. Moore,et al.  Algorithms for rapid outbreak detection: a research synthesis , 2005, J. Biomed. Informatics.

[53]  C P Farrington,et al.  Branching process models for surveillance of infectious diseases controlled by mass vaccination. , 2003, Biostatistics.

[54]  C. Fraser,et al.  Public Health Risk from the Avian H5N1 Influenza Epidemic , 2004, Science.

[55]  W. Team Ebola Virus Disease in West Africa — The First 9 Months of the Epidemic and Forward Projections , 2014 .

[56]  Mark A. Miller,et al.  Synchrony, Waves, and Spatial Hierarchies in the Spread of Influenza , 2006, Science.

[57]  J Wallinga,et al.  Unravelling transmission trees of infectious diseases by combining genetic and epidemiological data , 2012, Proceedings of the Royal Society B: Biological Sciences.

[58]  Philippe Buchy,et al.  A reliable diagnosis of human rabies based on analysis of skin biopsy specimens. , 2008, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[59]  Christl A. Donnelly,et al.  Unraveling the drivers of MERS-CoV transmission , 2016, Proceedings of the National Academy of Sciences.

[60]  Thibaut Jombart,et al.  adegenet: a R package for the multivariate analysis of genetic markers , 2008, Bioinform..

[61]  C. Fraser,et al.  A New Framework and Software to Estimate Time-Varying Reproduction Numbers During Epidemics , 2013, American journal of epidemiology.

[62]  Martin Kulldorff,et al.  Prospective time periodic geographical disease surveillance using a scan statistic , 2001 .

[63]  Jacco Wallinga,et al.  Finding Evidence for Local Transmission of Contagious Disease in Molecular Epidemiological Datasets , 2013, PloS one.

[64]  Daniel Falush,et al.  Population Genetics of Campylobacter , 2010 .

[65]  Simon Cauchemez,et al.  Edinburgh Research Explorer Middle East respiratory syndrome coronavirus: quantification of the extent of the epidemic, surveillance biases, and transmissibility , 2022 .

[66]  Brita B. Schneiders,et al.  stratag: An r package for manipulating, summarizing and analysing population genetic data , 2017, Molecular ecology resources.

[67]  Edzer J. Pebesma,et al.  Applied Spatial Data Analysis with R - Second Edition , 2008, Use R!.

[68]  E. Lyons,et al.  Pandemic Potential of a Strain of Influenza A (H1N1): Early Findings , 2009, Science.

[69]  Mikiko Senga,et al.  Ebola virus disease in West Africa--the first 9 months of the epidemic and forward projections. , 2014, The New England journal of medicine.