Fuzzy outlier analysis a combined clustering - outlier detection approach

Many outlier detection methods identify outliers ignoring any structure in data. However, it is sometimes beneficial to integrate outlierness and a method that groups data, such as clustering. This enhances both outlier and cluster analysis. In this paper, a fuzzy approach is proposed for integrating results from an outlier detection method and a clustering algorithm. A universal set of clusters is proposed which combines clusters obtained from clustering, and a virtual cluster for the outliers. The approach has two phases; the first computes patterns' initial memberships for the outlier cluster, and the second calculates memberships for the universal clusters, using an iterative membership propagation technique. The proposed approach is general and can combine any outlier detection method with any clustering algorithm. Both low and high dimensional data sets are used to illustrate the impact of the proposed approach.

[1]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[2]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[3]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Hui Wang,et al.  GLOF: a new approach for mining local outlier , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[5]  P. Rousseeuw,et al.  Computing depth contours of bivariate point clouds , 1996 .

[6]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[7]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[8]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[9]  Rajesh N. Davé,et al.  Characterization and detection of noise in clustering , 1991, Pattern Recognit. Lett..

[10]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[11]  David J. Hand,et al.  Statistical fraud detection: A review , 2002 .

[12]  Chao Yan,et al.  Outlier analysis for gene expression data , 2008, Journal of Computer Science and Technology.

[13]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[14]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[15]  George Karypis,et al.  C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling , 1999 .

[16]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[17]  Jiong Yang,et al.  An Approach to Active Spatial Data Mining Based on Statistical Information , 2000, IEEE Trans. Knowl. Data Eng..

[18]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[19]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[20]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[21]  Mikhail Petrovskiy,et al.  Outlier Detection Algorithms in Data Mining Systems , 2003, Programming and Computer Software.

[22]  Zengyou He,et al.  Discovering cluster-based local outliers , 2003, Pattern Recognit. Lett..

[23]  R.N. Dave,et al.  Generalized noise clustering as a robust fuzzy c-M-estimators model , 1998, 1998 Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.98TH8353).

[24]  Salvatore J. Stolfo,et al.  A Geometric Framework for Unsupervised Anomaly Detection , 2002, Applications of Data Mining in Computer Security.