On detecting spatial categorical outliers

Spatial outlier detection is an important research problem that has received much attentions in recent years. Most existing approaches are designed for numerical attributes, but are not applicable to categorical ones (e.g., binary, ordinal, and nominal) that are popular in many applications. The main challenges are the modeling of spatial categorical dependency as well as the computational efficiency. This paper presents the first outlier detection framework for spatial categorical data. Specifically, a new metric, named as Pair Correlation Ratio (PCR), is measured for each pair of category sets based on their co-occurrence frequencies at specific spatial distance ranges. The relevances among spatial objects are then calculated using PCR values with regard to their spatial distances. The outlierness for each object is defined as the inverse of the average relevance between an object and its spatial neighbors. Those objects with the highest outlier scores are returned as spatial categorical outliers. A set of algorithms are further designed for single-attribute and multi-attribute spatial categorical datasets. Extensive experimental evaluations on both simulated and real datasets demonstrated the effectiveness and efficiency of our proposed approaches.

[1]  Shashi Shekhar,et al.  Spatial Databases: A Tour , 2003 .

[2]  Andrew W. Moore,et al.  Rule-based anomaly pattern detection for detecting disease outbreaks , 2002, AAAI/IAAI.

[3]  David J. Hand,et al.  Statistical Analysis and Modelling of Spatial Point Patterns by Janine Illian, Antti Penttinen, Helga Stoyan, Dietrich Stoyan , 2008 .

[4]  R. Haining Spatial Data Analysis in the Social and Environmental Sciences , 1990 .

[5]  Andrew W. Moore,et al.  Scalable and practical probability density estimators for scientific anomaly detection , 2004 .

[6]  Martin Mueller,et al.  Self-aware services: using Bayesian networks for detecting anomalies in Internet-based services , 2001, 2001 IEEE/IFIP International Symposium on Integrated Network Management Proceedings. Integrated Network Management VII. Integrated Management Strategies for the New Millennium (Cat. No.01EX470).

[7]  Denis Allard,et al.  CART algorithm for spatial data: Application to environmental and ecological data , 2009, Comput. Stat. Data Anal..

[8]  Srinivasan Parthasarathy,et al.  Fast Distributed Outlier Detection in Mixed-Attribute Data Sets , 2006, Data Mining and Knowledge Discovery.

[9]  Chang-Tien Lu,et al.  Detecting spatial outliers with multiple attributes , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[10]  D. Stoyan,et al.  Statistical Analysis and Modelling of Spatial Point Patterns , 2008 .

[11]  Zengyou He,et al.  FP-outlier: Frequent pattern based outlier detection , 2005, Comput. Sci. Inf. Syst..

[12]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[13]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[14]  Shashi Shekhar,et al.  Detecting graph-based spatial outliers: algorithms and applications (a summary of results) , 2001, KDD '01.

[15]  Hui Xiong,et al.  Discovering colocation patterns from spatial data sets: a general approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[16]  George Grekousis,et al.  A fuzzy index for detecting spatiotemporal outliers , 2012, GeoInformatica.

[17]  Chang-Tien Lu,et al.  Detecting region outliers in meteorological data , 2003, GIS '03.

[18]  Klaus-Jürgen Förster,et al.  ÜBER POSITIVE QUADRATURFORMELN MIT MINIMALEM STÜTZSTELLENABSTAND , 1986 .

[19]  Costas S. Tzafestas,et al.  Maximum Likelihood SLAM in Dynamic Environments , 2007 .

[20]  Shashi Shekhar,et al.  A Joinless Approach for Mining Spatial Colocation Patterns , 2006, IEEE Transactions on Knowledge and Data Engineering.

[21]  Hanan Samet,et al.  Incremental distance join algorithms for spatial databases , 1998, SIGMOD '98.

[22]  Jiawei Han,et al.  Discovery of Spatial Association Rules in Geographic Information Databases , 1995, SSD.

[23]  Zengyou He,et al.  A Frequent Pattern Discovery Method for Outlier Detection , 2004, WAIM.

[24]  Georgios C. Anagnostopoulos,et al.  A Scalable and Efficient Outlier Detection Strategy for Categorical Data , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[25]  Shashi Shekhar,et al.  A Unified Approach to Detecting Spatial Outliers , 2003, GeoInformatica.

[26]  Sanjay Chawla,et al.  On local spatial outliers , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[27]  T. Reed,et al.  Applied statistical mechanics : thermodynamic and transport properties of fluids , 1973 .

[28]  C. Lu A Uniied Approach to Spatial Outliers Detection , 2003 .

[29]  Chang-Tien Lu,et al.  On Detecting Spatial Outliers , 2008, GeoInformatica.

[30]  Shashi Shekhar,et al.  Spatial Databases - Accomplishments and Research Needs , 1999, IEEE Trans. Knowl. Data Eng..

[31]  Yan Huang,et al.  Discovering Spatial Co-location Patterns: A Summary of Results , 2001, SSTD.

[32]  Chang-Tien Lu,et al.  Spatial Outlier Detection: A Graph-Based Approach , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[33]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[34]  Michael Georgiopoulos,et al.  Fast parallel outlier detection for categorical datasets using MapReduce , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[35]  Hans-Peter Kriegel,et al.  Fast nearest neighbor search in high-dimensional space , 1998, Proceedings 14th International Conference on Data Engineering.

[36]  TaeChoong Chung,et al.  Natural Gradient Policy for Average Cost SMDP Problem , 2007 .

[37]  Timothy C. Coburn,et al.  Geostatistics for Natural Resources Evaluation , 2000, Technometrics.

[38]  Chang-Tien Lu,et al.  Algorithms for spatial outlier detection , 2003, Third IEEE International Conference on Data Mining.

[39]  Jeff G. Schneider,et al.  Detecting anomalous records in categorical datasets , 2007, KDD '07.

[40]  Divyakant Agrawal,et al.  Approximate nearest neighbor searching in multimedia databases , 2001, Proceedings 17th International Conference on Data Engineering.

[41]  Chang-Tien Lu,et al.  Spatial outlier detection: random walk based approaches , 2010, GIS '10.

[42]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.

[43]  L. Anselin Local Indicators of Spatial Association—LISA , 2010 .

[44]  Philip K. Chan,et al.  A Machine Learning Approach to Anomaly Detection , 2003 .

[45]  Vijayalakshmi Atluri,et al.  Neighborhood based detection of anomalies in high dimensional spatio-temporal sensor datasets , 2004, SAC '04.

[46]  Hui Xiong,et al.  Mining Co-Location Patterns with Rare Events from Spatial Data Sets , 2006, GeoInformatica.

[47]  Vipin Kumar,et al.  Similarity Measures for Categorical Data: A Comparative Evaluation , 2008, SDM.

[48]  Zengyou He,et al.  A Fast Greedy Algorithm for Outlier Mining , 2005, PAKDD.

[49]  Divyakant Agrawal,et al.  Reverse Nearest Neighbor Queries for Dynamic Databases , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[50]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[51]  Zengyou He,et al.  An Optimization Model for Outlier Detection in Categorical Data , 2005, ICIC.

[52]  Arnold P. Boedihardjo,et al.  GLS-SOD: a generalized local statistical approach for spatial outlier detection , 2010, KDD '10.