Mining multidimensional contextual outliers from categorical relational data

A wide range of methods have been proposed for detecting different types of outliers in full space and subspaces. However, the interpretability of outliers, that is, explaining in what ways and to what extent an object is an outlier, remains a critical open issue. In this paper, we develop a notion of contextual outliers on categorical data. Intuitively, a contextual outlier is a small group of objects that share strong similarity with a significantly larger reference group of objects on some attributes, but deviate dramatically on some other attributes. We develop a detection algorithm, and conduct experiments to evaluate our approach.

[1]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[2]  Christos Faloutsos,et al.  Fast and reliable anomaly detection in categorical data , 2012, CIKM.

[3]  Ian Davidson,et al.  Discovering Contexts and Contextual Outliers Using Random Walks in Graphs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[4]  John R. Anderson,et al.  A Learning System and Its Psychological Implications , 1979, IJCAI.

[5]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[6]  Li Wei,et al.  HOT: Hypergraph-Based Outlier Test for Categorical Data , 2003, PAKDD.

[7]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[8]  Donald E. Brown,et al.  An Outlier-based Data Association Method for Linking Criminal Incidents , 2003, SDM.

[9]  Laks V. S. Lakshmanan,et al.  Quotient Cube: How to Summarize the Semantics of a Data Cube , 2002, VLDB.

[10]  Reda Alhajj,et al.  A comprehensive survey of numeric and symbolic outlier mining techniques , 2006, Intell. Data Anal..

[11]  Guozhu Dong,et al.  Masquerader Detection Using OCLEP: One-Class Classification Using Length Statistics of Emerging Patterns , 2006, 2006 Seventh International Conference on Web-Age Information Management Workshops.

[12]  Philip K. Chan,et al.  A Machine Learning Approach to Anomaly Detection , 2003 .

[13]  Jilles Vreeken,et al.  The Odd One Out: Identifying and Characterising Anomalies , 2011, SDM.

[14]  Guizhen Yang,et al.  The complexity of mining maximal frequent itemsets and maximal frequent patterns , 2004, KDD.

[15]  Luigi Palopoli,et al.  Detecting outlying properties of exceptional objects , 2009, TODS.

[16]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[17]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[18]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[19]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[20]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[21]  Philip K. Chan,et al.  Learning rules for anomaly detection of hostile network traffic , 2003, Third IEEE International Conference on Data Mining.

[22]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[23]  Sameer Singh,et al.  Novelty detection: a review - part 2: : neural network based approaches , 2003, Signal Process..

[24]  Hiroyuki Kitagawa,et al.  Detecting Outliers in Categorical Record Databases Based on Attribute Associations , 2008, APWeb.

[25]  Jeff G. Schneider,et al.  Anomaly pattern detection in categorical datasets , 2008, KDD.

[26]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[27]  Emmanuel Müller,et al.  Statistical selection of relevant subspace projections for outlier ranking , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[28]  Milos Hauskrecht,et al.  Conditional Anomaly Detection with Soft Harmonic Functions , 2011, 2011 IEEE 11th International Conference on Data Mining.

[29]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[30]  James Bailey,et al.  Overview of Contrast Data Mining as a Field and Preview of an Upcoming Book , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[31]  Hans-Peter Kriegel,et al.  Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data , 2009, PAKDD.

[32]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[33]  Jeff G. Schneider,et al.  Detecting anomalous records in categorical datasets , 2007, KDD '07.

[34]  Andrew W. Moore,et al.  Rule-based anomaly pattern detection for detecting disease outbreaks , 2002, AAAI/IAAI.

[35]  Jung-Min Park,et al.  An overview of anomaly detection techniques: Existing solutions and latest technological trends , 2007, Comput. Networks.

[36]  Sanjay Ranka,et al.  Conditional Anomaly Detection , 2007, IEEE Transactions on Knowledge and Data Engineering.