Building test data from real outbreaks for evaluating detection algorithms

Benchmarking surveillance systems requires realistic simulations of disease outbreaks. However, obtaining these data in sufficient quantity, with a realistic shape and covering a sufficient range of agents, size and duration, is known to be very difficult. The dataset of outbreak signals generated should reflect the likely distribution of authentic situations faced by the surveillance system, including very unlikely outbreak signals. We propose and evaluate a new approach based on the use of historical outbreak data to simulate tailored outbreak signals. The method relies on a homothetic transformation of the historical distribution followed by resampling processes (Binomial, Inverse Transform Sampling Method—ITSM, Metropolis-Hasting Random Walk, Metropolis-Hasting Independent, Gibbs Sampler, Hybrid Gibbs Sampler). We carried out an analysis to identify the most important input parameters for simulation quality and to evaluate performance for each of the resampling algorithms. Our analysis confirms the influence of the type of algorithm used and simulation parameters (i.e. days, number of cases, outbreak shape, overall scale factor) on the results. We show that, regardless of the outbreaks, algorithms and metrics chosen for the evaluation, simulation quality decreased with the increase in the number of days simulated and increased with the number of cases simulated. Simulating outbreaks with fewer cases than days of duration (i.e. overall scale factor less than 1) resulted in an important loss of information during the simulation. We found that Gibbs sampling with a shrinkage procedure provides a good balance between accuracy and data dependency. If dependency is of little importance, binomial and ITSM methods are accurate. Given the constraint of keeping the simulation within a range of plausible epidemiological curves faced by the surveillance system, our study confirms that our approach can be used to generate a large spectrum of outbreak signals.

[1]  Didier Raoult,et al.  Molecular, Epidemiological, and Clinical Complexities of Predicting Patterns of Infectious Diseases , 2011, Front. Microbio..

[2]  Hiroshi Nishiura,et al.  Early efforts in modeling the incubation period of infectious diseases with an acute course of illness , 2007, Emerging themes in epidemiology.

[3]  Pranesh Kumar,et al.  A symmetric information divergence measure of the Csiszár's f-divergence class and its bounds , 2005 .

[4]  Yanjia Bai An Adaptive Directional Metropolis-within-Gibbs algorithm , 2009 .

[5]  Edward J. Dudewicz,et al.  Handbook of Fitting Statistical Distributions with R , 2010 .

[6]  Acip Prevention and control of influenza : recommendations of the Advisory Committee on Immunization Practices (ACIP) , 2004 .

[7]  J. Dagpunar Simulation and Monte Carlo: With Applications in Finance and MCMC , 2007 .

[8]  Guohua Chen,et al.  An outbreak of dengue virus serotype 1 infection in Cixi, Ningbo, People's Republic of China, 2004, associated with a traveler from Thailand and high density of Aedes albopictus. , 2007, The American journal of tropical medicine and hygiene.

[9]  Outbreaks of Norwalk-like viral gastroenteritis--Alaska and Wisconsin, 1999. , 2000, MMWR. Morbidity and mortality weekly report.

[10]  P E Sartwell,et al.  The incubation period and the dynamics of infectious disease. , 1966, American journal of epidemiology.

[11]  M. Hugh-jones,et al.  The Sverdlovsk anthrax outbreak of 1979. , 1994, Science.

[12]  D Raoult,et al.  Modelling in infectious diseases: between haphazard and hazard , 2013, Clinical Microbiology and Infection.

[13]  Mauro Birattari,et al.  How to assess and report the performance of a stochastic algorithm on a benchmark problem: mean or best result on a number of runs? , 2007, Optim. Lett..

[14]  Christian P. Robert,et al.  Introducing Monte Carlo Methods with R , 2009 .

[15]  Ledyard R Tucker,et al.  Determination of parameters of a functional relation by factor analysis , 1958 .

[16]  Phillip D. Stroud,et al.  EpiSimS simulation of a multi-component strategy for pandemic influenza , 2008, SpringSim '08.

[17]  Jeffrey S. Duchin,et al.  A simulation study comparing aberration detection algorithms for syndromic surveillance , 2007, BMC Medical Informatics Decis. Mak..

[18]  Matthias Schonlau,et al.  Syndromic Surveillance: Is it Worth the Effort? , 2004 .

[19]  Marc-Alain Widdowson,et al.  Outbreaks of gastroenteritis associated with noroviruses on cruise ships--United States, 2002. , 2002, MMWR. Morbidity and mortality weekly report.

[20]  Djc MacKay,et al.  Slice sampling - Discussion , 2003 .

[21]  J. O. Giraldo,et al.  Deterministic SIR (Susceptible–Infected–Removed) models applied to varicella outbreaks , 2007, Epidemiology and Infection.

[22]  Stephen Eubank,et al.  in silico Surveillance: evaluating outbreak detection with simulation models , 2013, BMC Medical Informatics and Decision Making.

[23]  David L. Buckeridge,et al.  Outbreak detection through automated surveillance: A review of the determinants of detection , 2007, J. Biomed. Informatics.

[24]  S. Walker Invited comment on the paper "Slice Sampling" by Radford Neal , 2003 .

[25]  E. Nsoesie,et al.  A Simulation Optimization Approach to Epidemic Forecasting , 2013, PloS one.

[26]  Galit Shmueli,et al.  Simulating Multivariate Syndromic Time Series and Outbreak Signatures , 2007 .

[27]  I. Kohane,et al.  An Epidemiological Network Model for Disease Outbreak Detection , 2007, PLoS medicine.

[28]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[29]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[30]  Sung-Hyuk Cha Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[31]  Kenneth D Mandl,et al.  Measuring outbreak-detection performance by using controlled feature set simulations. , 2004, MMWR supplements.

[32]  G. A. Baker,et al.  Factor analysis of relative growth. , 1954, Growth.

[33]  Volker Schmid,et al.  A two-component model for counts of infectious diseases. , 2005, Biostatistics.

[34]  Marcello Pagano,et al.  Using temporal context to improve biosurveillance , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Andrew W. Moore,et al.  Algorithms for rapid outbreak detection: a research synthesis , 2005, J. Biomed. Informatics.

[36]  Outbreak of measles among Christian Science students--Missouri and Illinois, 1994. , 1995, Canada communicable disease report = Releve des maladies transmissibles au Canada.

[37]  Outbreak of measles among Christian Science students--Missouri and Illinois, 1994. , 1994, MMWR. Morbidity and mortality weekly report.

[38]  H. Burkom Development, adaptation, and assessment of alerting algorithms for biosurveillance , 2003 .

[39]  Daniel Zeng,et al.  Infectious Disease Informatics: Syndromic Surveillance for Public Health and BioDefense , 2009 .

[40]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[41]  Andre Charlett,et al.  Comparison of Statistical Algorithms for the Detection of Infectious Disease Outbreaks in Large Multiple Surveillance Systems , 2016, PloS one.

[42]  P. Kaye Infectious diseases of humans: Dynamics and control , 1993 .

[43]  G Samsa,et al.  Criteria for the use of Sartwell's incubation period model to study chronic diseases with uncertain etiology. , 1992, Journal of clinical epidemiology.

[44]  S. Blount,et al.  Lead Visual Information Specialist , 2003 .

[45]  Mark D. Reid,et al.  Information, Divergence and Risk for Binary Experiments , 2009, J. Mach. Learn. Res..

[46]  Kenneth A. Bollen,et al.  Latent curve models: A structural equation perspective , 2005 .

[47]  Didier Raoult,et al.  Microbe interactions undermine predictions. , 2011, Science.

[48]  J. Rosenthal,et al.  Coupling and Ergodicity of Adaptive Markov Chain Monte Carlo Algorithms , 2007, Journal of Applied Probability.

[49]  Voratas Kachitvichyanukul,et al.  Binomial random variate generation , 1988, CACM.