Improvement of the fuzzy C-Means clustering algorithm with adaptive learning of the dissimilarities among categorical feature values

In [1], recently we proposed a generalization of the frequency-based cluster prototype [2-4], in the same framework of the Fuzzy C-Means clustering algorithm, for the objects of mixed features. In the generalization, a general dissimilarity measure, not the simple matching dissimilarity, is assumed for each categorical feature. In this paper we develop an adaptive method to learn dissimilarity measures for categorical features. We include the method into the framework of the Fuzzy C-Means algorithm so that the clustering algorithm can use the dissimilarity measures rather than the simple matching dissimilarity measure for categorical features. Through the experiments over real object sets, we show the clustering quality becomes better.

[1]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[2]  Zeng-qi Sun,et al.  Improved validation index for fuzzy clustering , 2005, Proceedings of the 2005, American Control Conference, 2005..

[3]  Michael K. Ng,et al.  On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Mahnhoon Lee On fuzzy cluster validity indices for the objects of mixed features , 2009, 2009 IEEE International Conference on Fuzzy Systems.

[6]  Witold Pedrycz,et al.  The fuzzy C-means algorithm with fuzzy P-mode prototypes for clustering objects having mixed features , 2009, Fuzzy Sets Syst..

[7]  Witold Pedrycz,et al.  Adaptive learning of ordinal-numerical mappings through fuzzy clustering for the objects of mixed features , 2010, Fuzzy Sets Syst..

[8]  Zengyou He,et al.  Improving K-Modes Algorithm Considering Frequencies of Attribute Values in Mode , 2005, CIS.

[9]  Mahnhoon Lee Fuzzy p-mode prototypes: A generalization of frequency-based cluster prototypes for clustering categorical objects , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[10]  M. Lee Mapping of ordinal feature values to numerical values through fuzzy clustering , 2008, 2008 IEEE International Conference on Fuzzy Systems (IEEE World Congress on Computational Intelligence).

[11]  Michael K. Ng,et al.  A fuzzy k-modes algorithm for clustering categorical data , 1999, IEEE Trans. Fuzzy Syst..

[12]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[13]  Doheon Lee,et al.  Fuzzy clustering of categorical data using fuzzy centroids , 2004, Pattern Recognit. Lett..

[14]  Rajesh N. Davé,et al.  Validating fuzzy partitions obtained through c-shells clustering , 1996, Pattern Recognit. Lett..

[15]  Ohn Mar San,et al.  An alternative extension of the k-means algorithm for clustering categorical data , 2004 .

[16]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[17]  Mahnhoon Lee Fuzzy cluster validity index based on object proximities defined over fuzzy partition matrices , 2008, 2008 IEEE International Conference on Fuzzy Systems (IEEE World Congress on Computational Intelligence).

[18]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.