A Model-Agnostic Framework for Fast Spatial Anomaly Detection

Given a spatial dataset placed on an n ×n grid, our goal is to find the rectangular regions within which subsets of the dataset exhibit anomalous behavior. We develop algorithms that, given any user-supplied arbitrary likelihood function, conduct a likelihood ratio hypothesis test (LRT) over each rectangular region in the grid, rank all of the rectangles based on the computed LRT statistics, and return the top few most interesting rectangles. To speed this process, we develop methods to prune rectangles without computing their associated LRT statistics.

[1]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  M. Kulldorff A spatial scan statistic , 1997 .

[4]  Andrew W. Moore,et al.  A Fast Multi-Resolution Method for Detection of Significant Spatial Disease Clusters , 2003, NIPS.

[5]  M. Kulldorff Spatial Scan Statistics: Models, Calculations, and Applications , 1999 .

[6]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[7]  Andrew W. Moore,et al.  Rapid detection of significant spatial clusters , 2004, KDD.

[8]  Sanjay Ranka,et al.  A LRT framework for fast spatial anomaly detection , 2009, KDD.

[9]  Zhengyuan Zhu,et al.  ACCOUNTING FOR SPATIAL CORRELATION IN THE SCAN STATISTIC , 2007, 0712.1458.

[10]  S. S. Wilks The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses , 1938 .

[11]  Zhengyuan Zhu,et al.  Spatial scan statistics: approximations and performance study , 2006, KDD '06.

[12]  Jiawei Han,et al.  ACM Transactions on Knowledge Discovery from Data: Introduction , 2007 .

[13]  Suresh Venkatasubramanian,et al.  The hunting of the bump: on maximizing statistical discrepancy , 2005, SODA '06.

[14]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[15]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[16]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[17]  M. Newton Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis , 2008 .

[18]  B. Efron Correlation and Large-Scale Simultaneous Significance Testing , 2007 .