Spatial outlier detection in heterogeneous neighborhoods

Spatial outlier detection approaches identify outliers by first defining a spatial neighborhood. However, existing approaches suffer from two issues: (1) they primarily consider autocorrelation alone in forming the neighborhood, but ignore heterogeneity among spatial objects. (2) they do not consider interrelationships among the attributes for identifying how distinct the object is with respect to its neighbors, but consider them independently (either single or multiple). As a result, one may not identify truly unusual spatial objects and may also end up with frivolous outliers. In this paper, we revisit the computation of the spatial neighborhood and propose an approach to address the above two issues. We begin our approach with identifying a spatially related neighborhood, capturing autocorrelation. We then consider interrelationships between attributes and multiple, multilevel distributions within these attributes, thus considering autocorrelation and heterogeneity in various forms. Subsequently, we identify outliers in these neighborhoods. Our experimental results in various datasets (North Carolina SIDS data, New Mexico Leukemia data, etc.) indicate that our approach indeed correctly identifies outliers in heterogeneous neighborhoods.

[1]  J. Reynolds,et al.  A Simulation Experiment to Quantify Spatial Heterogeneity in Categorical Maps , 1994 .

[2]  Hans-Peter Kriegel,et al.  OPTICS-OF: Identifying Local Outliers , 1999, PKDD.

[3]  Vijayalakshmi Atluri,et al.  Neighborhood based detection of anomalies in high dimensional spatio-temporal sensor datasets , 2004, SAC '04.

[4]  Chang-Tien Lu,et al.  Spatial Weighted Outlier Detection , 2006, SDM.

[5]  Hui Xiong,et al.  Discovering colocation patterns from spatial data sets: a general approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[6]  Sanjay Chawla,et al.  On local spatial outliers , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[7]  Hui Xiong,et al.  Mining Co-Location Patterns with Rare Events from Spatial Data Sets , 2006, GeoInformatica.

[8]  Ki-Joune Li,et al.  A spatial data mining method by Delaunay triangulation , 1997, GIS '97.

[9]  J. Naus The Distribution of the Size of the Maximum Cluster of Points on a Line , 1965 .

[10]  M. Kulldorff A spatial scan statistic , 1997 .

[11]  Chang-Tien Lu,et al.  Detecting spatial outliers with multiple attributes , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[12]  Hans-Peter Kriegel,et al.  Spatial Data Mining: A Database Approach , 1997, SSD.

[13]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[14]  Vijayalakshmi Atluri,et al.  FS/sup 3/: a random walk based free-form spatial scan statistic for anomalous window detection , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[15]  W. F. Athas,et al.  Evaluating cluster alarms: a space-time scan statistic and brain cancer in Los Alamos, New Mexico. , 1998, American journal of public health.

[16]  Matthew W State,et al.  A common cardiac sodium channel variant associated with sudden infant death in African Americans, SCN5A S1103Y. , 2006, The Journal of clinical investigation.

[17]  Jim Freeman,et al.  Outliers in Statistical Data (3rd edition) , 1995 .

[18]  Shashi Shekhar,et al.  Detecting graph-based spatial outliers: algorithms and applications (a summary of results) , 2001, KDD '01.

[19]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.