A density oriented fuzzy C-means clustering algorithm for recognising original cluster shapes from noisy data

There are many clustering algorithms in the literature that are robust against outliers. They are robust because they decrease the effect of outliers on the cluster centroid locations but they do not result into efficient clusters as they include outliers in the final clusters. The limitation with these algorithms is that they do not identify outliers. In this paper, we propose an algorithm, density oriented fuzzy C-means (DOFCM) which identifies outliers based upon density of points in the dataset before creating clusters and results into 'n + 1' clusters, with 'n' good and one invalid cluster containing noise and outliers. Proposed technique is based on the concept that if these outliers are not required in clustering then their memberships should not be involved during clustering. We tried to nullify the effect of outliers by assigning them zero membership value during clustering. It is applied to various synthetic datasets, Bensaid's data and is compared with well known robust clustering techniques, namely, PFCM, CFCM, and NC. Results obtained after comparing the performance of these algorithms concluded that DOFCM is the best method to recognise original shape of clusters from noisy datasets.

[1]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[2]  Rajesh N. Davé,et al.  Characterization and detection of noise in clustering , 1991, Pattern Recognit. Lett..

[3]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[4]  Mohammad Mehdi Homayounpour,et al.  Robust weighted fuzzy c-means clustering , 2008, 2008 IEEE International Conference on Fuzzy Systems (IEEE World Congress on Computational Intelligence).

[5]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[6]  R.N. Dave,et al.  Robust fuzzy clustering algorithms , 1993, [Proceedings 1993] Second IEEE International Conference on Fuzzy Systems.

[7]  James M. Keller,et al.  A possibilistic fuzzy c-means clustering algorithm , 2005, IEEE Transactions on Fuzzy Systems.

[8]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[9]  P. Rousseeuw,et al.  Computing depth contours of bivariate point clouds , 1996 .

[10]  Frank Klawonn,et al.  A Novel Approach to Noise Clustering for Outlier Detection , 2006, Soft Comput..

[11]  Jung-Hua Wang,et al.  A new robust clustering algorithm-density-weighted fuzzy c-means , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[12]  Moshe Kam,et al.  A noise-resistant fuzzy c means algorithm for clustering , 1998, 1998 IEEE International Conference on Fuzzy Systems Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36228).

[13]  James C. Bezdek,et al.  Validity-guided (re)clustering with applications to image segmentation , 1996, IEEE Trans. Fuzzy Syst..

[14]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.