An Optimization Model for Outlier Detection in Categorical Data

In this paper, we formally define the problem of outlier detection in categorical data as an optimization problem from a global viewpoint. Moreover, we present a local-search heuristic based algorithm for efficiently finding feasible solutions. Experimental results on real datasets and large synthetic datasets demonstrate the superiority of our model and algorithm.

[1]  Ada Wai-Chee Fu,et al.  Enhancements on local outlier detection , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[2]  Hongxing He,et al.  Outlier Detection Using Replicator Neural Networks , 2002, DaWaK.

[3]  Raymond T. Ng,et al.  A Unified Notion of Outliers: Properties and Computation , 1997, KDD.

[4]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[5]  Aidong Zhang,et al.  FindOut: Finding Outliers in Very Large Datasets , 2002, Knowledge and Information Systems.

[6]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[7]  Zengyou He,et al.  A Frequent Pattern Discovery Method for Outlier Detection , 2004, WAIM.

[8]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[9]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[10]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[11]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[12]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[13]  Graham J. Williams,et al.  On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms , 2000, KDD '00.

[14]  Theodore Johnson,et al.  Fast Computation of 2-Dimensional Depth Contours , 1998, KDD.

[15]  Clara Pizzuti,et al.  Fast Outlier Detection in High Dimensional Spaces , 2002, PKDD.

[16]  Kenji Yamanishi,et al.  Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner , 2001, KDD '01.

[17]  Shian-Shyong Tseng,et al.  Two-phase clustering process for outliers detection , 2001, Pattern Recognit. Lett..

[18]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[19]  Heow Pueh Lee,et al.  Modified support vector novelty detector using training data with outliers , 2003, Pattern Recognit. Lett..

[20]  P. Rousseeuw,et al.  Computing depth contours of bivariate point clouds , 1996 .

[21]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[22]  Zengyou He,et al.  Mining class outliers: concepts, algorithms and applications in CRM , 2004, Expert Syst. Appl..

[23]  Stephen D. Bay,et al.  Mining distance-based outliers in near linear time with randomization and a simple pruning rule , 2003, KDD '03.

[24]  Raymond T. Ng,et al.  Finding Intensional Knowledge of Distance-Based Outliers , 1999, VLDB.

[25]  Rajeev Rastogi,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD 2000.

[26]  Zengyou He,et al.  Discovering cluster-based local outliers , 2003, Pattern Recognit. Lett..

[27]  Hongxing He,et al.  A comparative study of RNN for outlier detection in data mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[28]  Mikhail Petrovskiy,et al.  A Hybrid Method for Patterns Mining and Outliers Detection in the Web Usage Log , 2003, AWIC.

[29]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[30]  Anthony K. H. Tung,et al.  Mining top-n local outliers in large databases , 2001, KDD '01.

[31]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[32]  Christos Faloutsos,et al.  Cross-Outlier Detection , 2003, SSTD.

[33]  Sam Yuan Sung,et al.  Detecting pattern-based outliers , 2003, Pattern Recognit. Lett..

[34]  Jian Tang,et al.  Enhancing Effectiveness of Outlier Detections for Low Density Patterns , 2002, PAKDD.

[35]  Prabhakar Raghavan,et al.  A Linear Method for Deviation Detection in Large Databases , 1996, KDD.

[36]  Li Wei,et al.  HOT: Hypergraph-Based Outlier Test for Categorical Data , 2003, PAKDD.

[37]  Zengyou He,et al.  Outlier Detection Integrating Semantic Knowledge , 2002, WAIM.

[38]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.