A fuzzy k-modes algorithm for clustering categorical data

This correspondence describes extensions to the fuzzy k-means algorithm for clustering categorical data. By using a simple matching dissimilarity measure for categorical objects and modes instead of means for clusters, a new approach is developed, which allows the use of the k-means paradigm to efficiently cluster large categorical data sets. A fuzzy k-modes algorithm is presented and the effectiveness of the algorithm is demonstrated with experimental results.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  Michael Spann,et al.  A new approach to clustering , 1990, Pattern Recognit..

[3]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[4]  G H Ball,et al.  A clustering technique for summarizing multivariate data. , 1967, Behavioral science.

[5]  M. Woodbury,et al.  Clinical Pure Types as a Fuzzy Partition , 1974 .

[6]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[7]  James C. Bezdek,et al.  Local convergence of the fuzzy c-Means algorithms , 1986, Pattern Recognit..

[8]  H. Ralambondrainy,et al.  A conceptual version of the K-means algorithm , 1995, Pattern Recognit. Lett..

[9]  Shokri Z. Selim,et al.  K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[11]  Ryszard S. Michalski,et al.  Automated Construction of Classifications: Conceptual Clustering Versus Numerical Taxonomy , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Shokri Z. Selim,et al.  Fuzzy c-means: Optimality of solutions and effective termination of the algorithm , 1986, Pattern Recognit..

[13]  Teuvo Kohonen,et al.  Content-addressable memories , 1980 .

[14]  Edwin Diday,et al.  Symbolic clustering using a new dissimilarity measure , 1991, Pattern Recognit..

[15]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[16]  James C. Bezdek,et al.  A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[18]  C BezdekJames A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms , 1980 .

[19]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .