FP-outlier: Frequent pattern based outlier detection

An outlier in a dataset is an observation or a point that is considerably dissimilar to or inconsistent with the remainder of the data. Detection of such outliers is important for many applications and has recently attracted much attention in the data mining research community. In this paper, we present a new method to detect outliers by discovering frequent patterns (or frequent itemsets) from the data set. The outliers are defined as the data transactions that contain less frequent patterns in their itemsets. We define a measure called FPOF (Frequent Pattern Outlier Factor) to detect the outlier transactions and propose the FindFPOF algorithm to discover outliers. The experimental results have shown that our approach outperformed the existing methods on identifying interesting outliers.

[1]  Marzena Kryszkiewicz,et al.  Concise Representation of Frequent Patterns Based on Generalized Disjunction-Free Generators , 2002, PAKDD.

[2]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[3]  Marzena Kryszkiewicz Concise representation of frequent patterns based on disjunction-free generators , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[4]  Kenji Yamanishi,et al.  Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner , 2001, KDD '01.

[5]  Hongxing He,et al.  Outlier Detection Using Replicator Neural Networks , 2002, DaWaK.

[6]  Clara Pizzuti,et al.  Fast Outlier Detection in High Dimensional Spaces , 2002, PKDD.

[7]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[8]  Shian-Shyong Tseng,et al.  Two-phase clustering process for outliers detection , 2001, Pattern Recognit. Lett..

[9]  A. Madansky Identification of Outliers , 1988 .

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[12]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[13]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[14]  P. Rousseeuw,et al.  Computing depth contours of bivariate point clouds , 1996 .

[15]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[16]  Graham J. Williams,et al.  On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms , 2000, KDD '00.

[17]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[18]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[19]  Zengyou He,et al.  Discovering cluster-based local outliers , 2003, Pattern Recognit. Lett..

[20]  Raymond T. Ng,et al.  Finding Intensional Knowledge of Distance-Based Outliers , 1999, VLDB.

[21]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[22]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[23]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[24]  Prabhakar Raghavan,et al.  A Linear Method for Deviation Detection in Large Databases , 1996, KDD.

[25]  Li Wei,et al.  HOT: Hypergraph-Based Outlier Test for Categorical Data , 2003, PAKDD.

[26]  Aidong Zhang,et al.  FindOut: Finding Outliers in Very Large Datasets , 2002, Knowledge and Information Systems.

[27]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.