Graph Topic Scan Statistic for Spatial Event Detection

Spatial event detection is an important and challenging problem. Unlike traditional event detection that focuses on the timing of global urgent event, the task of spatial event detection is to detect the spatial regions (e.g. clusters of neighboring cities) where urgent events occur. In this paper, we focus on the problem of spatial event detection using textual information in social media. We observe that, when a spatial event occurs, the topics relevant to the event are often discussed more coherently in cities near the event location than those far away. In order to capture this pattern, we propose a new method called Graph Topic Scan Statistic (Graph-TSS) that corresponds to a generalized log-likelihood ratio test based on topic modeling. We first demonstrate that the detection of spatial event regions under Graph-TSS is NP-hard due to a reduction from classical node-weighted prize-collecting Steiner tree problem (NW-PCST). We then design an efficient algorithm that approximately maximizes the graph topic scan statistic over spatial regions of arbitrary form. As a case study, we consider three applications using Twitter data, including Argentina civil unrest event detection, Chile earthquake detection, and United States influenza disease outbreak detection. Empirical evidence demonstrates that the proposed Graph-TSS performs superior over state-of-the-art methods on both running time and accuracy.

[1]  Aristides Gionis,et al.  Event detection in activity networks , 2014, KDD.

[2]  Ganapati P. Patil,et al.  Geographic and Network Surveillance via Scan Statistics for Critical Area Detection , 2003 .

[3]  Chao Liu,et al.  A probabilistic approach to spatiotemporal theme pattern mining on weblogs , 2006, WWW '06.

[4]  Steffen Staab,et al.  Detecting non-gaussian geographical topics in tagged photo collections , 2014, WSDM.

[5]  Daniel B. Neill,et al.  Fast generalized subset scan for anomalous pattern detection , 2013, J. Mach. Learn. Res..

[6]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[7]  Jiawei Han,et al.  Geographical topic discovery and comparison , 2011, WWW.

[8]  Bo Hu,et al.  Spatio-Temporal Topic Models for Check-in Data , 2015, 2015 IEEE International Conference on Data Mining.

[9]  Alexander J. Smola,et al.  Hierarchical geographical modeling of user locations from social media posts , 2013, WWW.

[10]  M. Kulldorff,et al.  Multivariate scan statistics for disease surveillance , 2007, Statistics in medicine.

[11]  Aristides Gionis,et al.  ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2014, New York, NY, USA - August 24 - 27, 2014 , 2014 .

[12]  Daniel B. Neill,et al.  Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs , 2014, KDD.

[13]  Feng Chen,et al.  Graph-Structured Sparse Optimization for Connected Subgraph Detection , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[14]  Piotr Indyk,et al.  A Nearly-Linear Time Framework for Graph-Structured Sparsity , 2015, ICML.

[15]  Shino Shiode,et al.  Street‐level Spatial Scan Statistic and STAC for Analysing Street Crime Concentrations , 2011, Trans. GIS.

[16]  Jochen Könemann,et al.  An LMP O(log n)-Approximation Algorithm for Node Weighted Prize Collecting Steiner Tree , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[17]  Venkatesh Saligrama,et al.  Connected Sub-graph Detection , 2014, AISTATS.

[18]  Andrew W. Moore,et al.  Detection of spatial and spatio-temporal clusters , 2006 .

[19]  Alessandro Rinaldo,et al.  Changepoint Detection over Graphs with the Spectral Scan Statistic , 2012, AISTATS.

[20]  Dimitrios Gunopulos,et al.  On The Spatiotemporal Burstiness of Terms , 2012, Proc. VLDB Endow..

[21]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[22]  Daniel B Neill,et al.  An empirical comparison of spatial scan statistics for outbreak detection , 2009, International journal of health geographics.

[23]  Arindam Banerjee,et al.  Gaussian Process Topic Models , 2010, UAI.

[24]  Philip S. Yu,et al.  Parameter Free Bursty Events Detection in Text Streams , 2005, VLDB.

[25]  Tomoki Nakaya,et al.  Visualising Crime Clusters in a Space‐time Cube: An Exploratory Data‐analysis Approach Using Space‐time Kernel Density Estimation and Scan Statistics , 2010, Trans. GIS.

[26]  Andrew W. Moore,et al.  Detection of emerging space-time clusters , 2005, KDD '05.

[27]  Alexander J. Smola,et al.  Discovering geographical topics in the twitter stream , 2012, WWW.

[28]  Jianxin Li,et al.  Efficient Nonparametric Subgraph Detection Using Tree Shaped Priors , 2016, AAAI.

[29]  Yu Liu,et al.  Response Surface Modeling by Local Kernel Partial Least Squares , 2012, 2012 Fifth International Symposium on Parallel Architectures, Algorithms and Programming.

[30]  Sanjay Chawla,et al.  On detection of emerging anomalous traffic patterns using GPS data , 2013, Data Knowl. Eng..

[31]  Hector Garcia-Molina,et al.  Overview of multidatabase transaction management , 2005, The VLDB Journal.

[32]  Pemetaan Jumlah Balita,et al.  Spatial Scan Statistic , 2014, Encyclopedia of Social Network Analysis and Mining.