Improved K-Modes for Categorical Clustering Using Weighted Dissimilarity Measure

K-Modes is an extension of K-Means clustering algorithm, developed to cluster the categorical data, where the mean is replaced by the mode. The similarity measure proposed by Huang is the simple matching or mismatching measure. Weight of attribute values contribute much in clustering; thus in this paper we propose a new weighted dissimilarity measure for K-Modes, based on the ratio of frequency of attribute values in the cluster and in the data set. The new weighted measure is experimented with the data sets obtained from the UCI data repository. The results are compared with K-Modes and K-representative, which show that the new measure generates clusters with high purity. Keywords—Clustering, categorical data, K-Modes, weighted dissimilarity measure

[1]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[2]  Yi Li,et al.  COOLCAT: an entropy-based algorithm for categorical clustering , 2002, CIKM '02.

[3]  Panayiotis Tsaparas,et al.  Clustering Categorical Data based on Information Loss Minimization , 2022 .

[4]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[5]  Vasudha Bhatnagar,et al.  K-means Clustering Algorithm for Categorical Attributes , 1999, DaWaK.

[6]  He Zengyou,et al.  Squeezer: an efficient algorithm for clustering categorical data , 2002 .

[7]  徐晓飞,et al.  Squeezer:An Efficient Algorithm for Clustering Categorical Data , 2002 .

[8]  Joshua Zhexue Huang,et al.  A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining , 1997, DMKD.

[9]  A. K. Pujari,et al.  Data Mining Techniques , 2006 .

[10]  Johannes Gehrke,et al.  CACTUS—clustering categorical data using summaries , 1999, KDD '99.

[11]  George Karypis,et al.  C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling , 1999 .

[12]  Ohn Mar San,et al.  An alternative extension of the k-means algorithm for clustering categorical data , 2004 .

[13]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[14]  Doheon Lee,et al.  Fuzzy clustering of categorical data using fuzzy centroids , 2004, Pattern Recognit. Lett..

[15]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[16]  Limsoon Wong,et al.  DATA MINING TECHNIQUES , 2003 .