Knowledge and Information Systems

We propose a measure, spatial local outlier measure (SLOM), which captures the local behaviour of datum in their spatial neighbourhood. With the help of SLOM, we are able to discern local spatial outliers that are usually missed by global techniques, like “three standard deviations away from the mean”. Furthermore, the measure takes into account the local stability around a data point and suppresses the reporting of outliers in highly unstable areas, where data are too heterogeneous and the notion of outliers is not meaningful. We prove several properties of SLOM and report experiments on synthetic and real data sets that show that our approach is novel and scalable to large datasets.

[1]  Chang-Tien Lu,et al.  Algorithms for spatial outlier detection , 2003, Third IEEE International Conference on Data Mining.

[2]  Chang-Tien Lu,et al.  Detecting spatial outliers with multiple attributes , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[3]  Stephen D. Bay,et al.  Mining distance-based outliers in near linear time with randomization and a simple pruning rule , 2003, KDD '03.

[4]  Shashi Shekhar,et al.  Spatial Databases: A Tour , 2003 .

[5]  Shashi Shekhar,et al.  A Unified Approach to Detecting Spatial Outliers , 2003, GeoInformatica.

[6]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[7]  R. Wilcox Applying Contemporary Statistical Techniques , 2003 .

[8]  Clara Pizzuti,et al.  Fast Outlier Detection in High Dimensional Spaces , 2002, PKDD.

[9]  Shashi Shekhar,et al.  Detecting graph-based spatial outliers: algorithms and applications (a summary of results) , 2001, KDD '01.

[10]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[11]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[12]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[13]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[14]  M. Mcphaden El Niño and La Niña : Causes and Global Consequences , 2001 .

[15]  A. Madansky Identification of Outliers , 1988 .