Anomalous Window Discovery for Linear Intersecting Paths

The focus of this paper is to discover anomalous windows in linear intersecting paths. Anomalous windows are the contiguous groupings of data points. A linear path refers to a path represented by a line with a single dimensional spatial coordinate marking an observation point. In this paper, we propose an approach for discovering anomalous windows using a class of algorithms based on scan statistics, specifically 1) an Order invariant algorithm using Scan Statistics for Linear Intersecting Paths (SSLIP), 2) Brute force-SSLIP (BF-SSLIP), and 3) Central Brute Force-SSLIP (CBF-SSLIP). We further present two efficient variants of SSLIP: SSLIP* which employs a upper bound on the scan window size, and SSLIP-Acc, which adopts an accelerator function to speed up the scan process. The proposed approach for discovering anomalous windows along linear paths comprises the following distinct steps: 1) Cross Path Discovery: where we identify a subset of intersecting paths to be considered, 2) Anomalous Window Discovery: where we outline the various algorithms for the traversal of the cross paths to identify varying size directional windows along the paths. For identifying an anomalous window, an unusualness metric is computed, in the form of a likelihood ratio to indicate the degree of unusualness of this window with respect to the rest of the data. We identify the window with the highest likelihood ratio as our anomalous window, and 3) Monte Carlo Simulations: to ascertain whether this window is truly anomalous and not merely random occurrence, we perform hypothesis testing by computing a p-value using Monte Carlo Simulations. We present extensive experimental results in real world accident data sets for various highways with known issues (code and data available from [32], [27]). Additionally, we also perform comparisons with current approaches [18], [34] to show the efficacy of our approach. Our results show that our approach indeed is effective in identifying anomalous traffic accident windows along multiple intersecting highways.

[1]  Li Wei,et al.  SAXually Explicit Images: Finding Unusual Shapes , 2006, Sixth International Conference on Data Mining (ICDM'06).

[2]  Eamonn J. Keogh,et al.  Disk aware discord discovery: finding unusual time series in terabyte sized datasets , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[3]  Joseph Naus,et al.  Multiple Window and Cluster Size Scan Procedures , 2004 .

[4]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[5]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[6]  Julian Besag,et al.  The Detection of Clusters in Rare Diseases , 1991 .

[7]  Andrew W. Moore,et al.  Detecting Significant Multidimensional Spatial Clusters , 2004, NIPS.

[8]  Vijayalakshmi Atluri,et al.  LS3: a Linear Semantic Scan Statistic technique for detecting anomalous windows , 2005, SAC '05.

[9]  Martin Charlton,et al.  A Mark 1 Geographical Analysis Machine for the automated analysis of point data sets , 1987, Int. J. Geogr. Inf. Sci..

[10]  T. Tango,et al.  International Journal of Health Geographics a Flexibly Shaped Spatial Scan Statistic for Detecting Clusters , 2005 .

[11]  Joseph Glaz,et al.  Multiple Window Discrete Scan Statistics , 2004 .

[12]  Arthur Getis,et al.  Reflections on spatial autocorrelation , 2007 .

[13]  Robert Haining,et al.  Spatial Data Analysis: Theory and Practice , 2003 .

[14]  R. T. Ogden,et al.  Testing change-points with linear trend , 1994 .

[15]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[16]  W. F. Athas,et al.  Evaluating cluster alarms: a space-time scan statistic and brain cancer in Los Alamos, New Mexico. , 1998, American journal of public health.

[17]  Narayanaswamy Balakrishnan,et al.  Scan Statistics and Applications , 2012 .

[18]  Jaideep Srivastava,et al.  Event detection from time series data , 1999, KDD '99.

[19]  Vijay S. Iyengar,et al.  On detecting space-time clusters , 2004, KDD.

[20]  J. Naus The Distribution of the Size of the Maximum Cluster of Points on a Line , 1965 .

[21]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[22]  Jae-Gil Lee,et al.  Trajectory Outlier Detection: A Partition-and-Detect Framework , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[23]  Laura Firoiu,et al.  Clustering Time Series with Hidden Markov Models and Dynamic Time Warping , 1999 .

[24]  W. Tobler A Computer Movie Simulating Urban Growth in the Detroit Region , 1970 .

[25]  Kenji Yamanishi,et al.  A unifying framework for detecting outliers and change points from non-stationary time series data , 2002, KDD.

[26]  Renato Assunção,et al.  A Simulated Annealing Strategy for the Detection of Arbitrarily Shaped Spatial Clusters , 2022 .

[27]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[28]  Vijayalakshmi Atluri,et al.  Random Walks to Identify Anomalous Free-Form Spatial Scan Windows , 2008, IEEE Transactions on Knowledge and Data Engineering.

[29]  Alfred W Kotchi,et al.  New Jersey’s Safe Corridor Program , 2007 .

[30]  D. Griffith Spatial Autocorrelation: A Primer , 1987 .

[31]  H. Miller Tobler's First Law and Spatial Analysis , 2004 .

[32]  Aryya Gangopadhyay,et al.  Discretized Spatio-Temporal Scan Window , 2009, SDM.