Fast subset scan for spatial pattern detection

Summary.  We propose a new ‘fast subset scan’ approach for accurate and computationally efficient event detection in massive data sets. We treat event detection as a search over subsets of data records, finding the subset which maximizes some score function. We prove that many commonly used functions (e.g. Kulldorff's spatial scan statistic and extensions) satisfy the ‘linear time subset scanning’ property, enabling exact and efficient optimization over subsets. In the spatial setting, we demonstrate that proximity‐constrained subset scans substantially improve the timeliness and accuracy of event detection, detecting emerging outbreaks of disease 2 days faster than existing methods.

[1]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[2]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[3]  M Kulldorff,et al.  Spatial disease clusters: detection and inference. , 1995, Statistics in medicine.

[4]  M. Kulldorff,et al.  Childhood leukaemia in Sweden: using GIS and a spatial scan statistic for cluster detection. , 1996, Statistics in medicine.

[5]  M. Kulldorff,et al.  Breast cancer clusters in the northeast United States: a geographic analysis. , 1997, American journal of epidemiology.

[6]  W. F. Athas,et al.  Evaluating cluster alarms: a space-time scan statistic and brain cancer in Los Alamos, New Mexico. , 1998, American journal of public health.

[7]  M. Kulldor,et al.  Prospective time-periodic geographical disease surveillance using a scan statistic , 2001 .

[8]  M. Kulldorff,et al.  Dead Bird Clusters as an Early Warning System for West Nile Virus Activity , 2003, Emerging infectious diseases.

[9]  G. P. Patil,et al.  Upper level set scan statistic for detecting arbitrarily shaped hotspots , 2004, Environmental and Ecological Statistics.

[10]  Andrew W. Moore,et al.  Rapid detection of significant spatial clusters , 2004, KDD.

[11]  Renato Assunção,et al.  A Simulated Annealing Strategy for the Detection of Arbitrarily Shaped Spatial Clusters , 2022 .

[12]  David L Buckeridge,et al.  Evaluation of syndromic surveillance systems--design of an epidemic simulation model. , 2004, MMWR supplements.

[13]  T. Tango,et al.  International Journal of Health Geographics a Flexibly Shaped Spatial Scan Statistic for Detecting Clusters , 2005 .

[14]  Andrew W. Moore,et al.  Detection of emerging space-time clusters , 2005, KDD '05.

[15]  G. Wallstrom,et al.  High-fidelity injection detectability experiments: a tool for evaluating syndromic surveillance systems. , 2005, MMWR supplements.

[16]  M. Kulldorff,et al.  An elliptic spatial scan statistic , 2006, Statistics in medicine.

[17]  Andrew W. Moore,et al.  Detection of spatial and spatio-temporal clusters , 2006 .

[18]  H. Burkom Biosurveillance applying scan statistics with multiple, disparate data sources , 2003, Journal of Urban Health.

[19]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[20]  M. Kulldorff,et al.  Multivariate scan statistics for disease surveillance , 2007, Statistics in medicine.

[21]  Ricardo H. C. Takahashi,et al.  A genetic algorithm for irregularly shaped spatial scan statistics , 2007, Comput. Stat. Data Anal..

[22]  Jeff W. Lingwall,et al.  A Nonparametric Scan Statistic for Multivariate Disease Surveillance , 2007 .

[23]  G. Cooper,et al.  The Bayesian aerosol release detector: An algorithm for detecting and characterizing outbreaks caused by an atmospheric release of Bacillus anthracis , 2007, Statistics in medicine.

[24]  Martin Kulldorff,et al.  A Spatial Scan Statistic for Survival Data , 2007, Biometrics.

[25]  Daniel B. Neill Fast and Flexible Outbreak Detection by Linear-Time Subset Scanning , 2008 .

[26]  Gregory F. Cooper,et al.  A multivariate Bayesian scan statistic for early event detection and characterization , 2010, Machine Learning.

[27]  Daniel B Neill,et al.  An empirical comparison of spatial scan statistics for outbreak detection , 2009, International journal of health geographics.

[28]  Daniel B. Neill,et al.  Fast generalized subset scan for anomalous pattern detection , 2013, J. Mach. Learn. Res..

[29]  Daniel B. Neill,et al.  Fast subset scan for multivariate event detection , 2013, Statistics in medicine.

[30]  Pemetaan Jumlah Balita,et al.  Spatial Scan Statistic , 2014, Encyclopedia of Social Network Analysis and Mining.