Fast subset scan for multivariate event detection

We present new subset scan methods for multivariate event detection in massive space-time datasets. We extend the recently proposed 'fast subset scan' framework from univariate to multivariate data, enabling computationally efficient detection of irregular space-time clusters even when the numbers of spatial locations and data streams are large. For two variants of the multivariate subset scan, we demonstrate that the scan statistic can be efficiently optimized over proximity-constrained subsets of locations and over all subsets of the monitored data streams, enabling timely detection of emerging events and accurate characterization of the affected locations and streams. Using our new fast search algorithms, we perform an empirical comparison of the Subset Aggregation and Kulldorff multivariate subset scans on synthetic data and real-world disease surveillance tasks, demonstrating tradeoffs between the detection and characterization performance of the two methods.

[1]  M. Kulldor,et al.  Prospective time-periodic geographical disease surveillance using a scan statistic , 2001 .

[2]  Andrew W. Moore,et al.  Detection of spatial and spatio-temporal clusters , 2006 .

[3]  M. Kulldorff,et al.  Dead Bird Clusters as an Early Warning System for West Nile Virus Activity , 2003, Emerging infectious diseases.

[4]  Gregory F Cooper,et al.  Issues in applied statistics for public health bioterrorism surveillance using multiple data streams: research needs , 2007, Statistics in medicine.

[5]  Andrew W. Moore,et al.  A Bayesian Spatial Scan Statistic , 2005, NIPS.

[6]  Andrew W. Moore,et al.  Rapid detection of significant spatial clusters , 2004, KDD.

[7]  Kenneth D. Mandl,et al.  HealthMap: Global Infectious Disease Monitoring through Automated Classification and Visualization of Internet Media Reports , 2008, Journal of the American Medical Informatics Association.

[8]  Daniel B. Neill,et al.  Fast generalized subset scan for anomalous pattern detection , 2013, J. Mach. Learn. Res..

[9]  G. Wallstrom,et al.  High-fidelity injection detectability experiments: a tool for evaluating syndromic surveillance systems. , 2005, MMWR supplements.

[10]  Daniel B. Neill,et al.  Fast Bayesian scan statistics for multivariate event detection and visualization , 2011, Statistics in medicine.

[11]  K. Kupka,et al.  International classification of diseases: ninth revision. , 1978, WHO chronicle.

[12]  T. Tango,et al.  International Journal of Health Geographics a Flexibly Shaped Spatial Scan Statistic for Detecting Clusters , 2005 .

[13]  G. Walther Optimal and fast detection of spatial clusters with scan statistics , 2010, 1002.4770.

[14]  M Kulldorff,et al.  Spatial disease clusters: detection and inference. , 1995, Statistics in medicine.

[15]  H. Burkom Biosurveillance applying scan statistics with multiple, disparate data sources , 2003, Journal of Urban Health.

[16]  Jeff W. Lingwall,et al.  A Nonparametric Scan Statistic for Multivariate Disease Surveillance , 2007 .

[17]  G. Cooper,et al.  The Bayesian aerosol release detector: An algorithm for detecting and characterizing outbreaks caused by an atmospheric release of Bacillus anthracis , 2007, Statistics in medicine.

[18]  Pemetaan Jumlah Balita,et al.  Spatial Scan Statistic , 2014, Encyclopedia of Social Network Analysis and Mining.

[19]  Gregory F. Cooper,et al.  A multivariate Bayesian scan statistic for early event detection and characterization , 2010, Machine Learning.

[20]  Andrew W. Moore,et al.  CHAPTER 16 – Methods for Detecting Spatial and Spatio-Temporal Clusters , 2006 .

[21]  M. Kulldorff,et al.  An elliptic spatial scan statistic , 2006, Statistics in medicine.

[22]  Renato Assunção,et al.  A Simulated Annealing Strategy for the Detection of Arbitrarily Shaped Spatial Clusters , 2022 .

[23]  Michael M. Wagner,et al.  Handbook of biosurveillance , 2006 .

[24]  M. Kulldorff,et al.  Multivariate scan statistics for disease surveillance , 2007, Statistics in medicine.

[25]  Tom Fawcett,et al.  Activity monitoring: noticing interesting changes in behavior , 1999, KDD '99.

[26]  M. Kulldorff,et al.  Breast cancer clusters in the northeast United States: a geographic analysis. , 1997, American journal of epidemiology.

[27]  J Coberly,et al.  Public health monitoring tools for multiple data streams. , 2005, MMWR supplements.

[28]  Andrew W. Moore,et al.  Detection of emerging space-time clusters , 2005, KDD '05.

[29]  W. F. Athas,et al.  Evaluating cluster alarms: a space-time scan statistic and brain cancer in Los Alamos, New Mexico. , 1998, American journal of public health.

[30]  Usa Prevention,et al.  Rapid health response, assessment, and surveillance after a tsunami--Thailand, 2004-2005. , 2005, MMWR. Morbidity and mortality weekly report.

[31]  Daniel B Neill,et al.  An empirical comparison of spatial scan statistics for outbreak detection , 2009, International journal of health geographics.

[32]  Ricardo H. C. Takahashi,et al.  A genetic algorithm for irregularly shaped spatial scan statistics , 2007, Comput. Stat. Data Anal..

[33]  M. Kulldorff,et al.  Childhood leukaemia in Sweden: using GIS and a spatial scan statistic for cluster detection. , 1996, Statistics in medicine.

[34]  G. P. Patil,et al.  Upper level set scan statistic for detecting arbitrarily shaped hotspots , 2004, Environmental and Ecological Statistics.

[35]  Daniel B. Neill,et al.  Fast subset scan for spatial pattern detection , 2012 .