Initialization of k-modes clustering for categorical data

The k-modes clustering algorithm is undoubtedly one of the most widely used partitional algorithms for categorical data. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initialization of clustering. Categorical initialization methods have been proposed to address this problem. In this paper, we present an overview of initialization methods of clustering for numerical data and categorical data respectively with an emphasis on their computational efficiency. We then propose a new initialization method for categorical data, which can obtain the good initial cluster centers using the new distance base on the RD, and explore the methods of density and grid. Finally, proposed method has been tested on diagnosis dataset, a real world data set from UCI Machine Learning Repository, and been analyzed the experimental results, which illustrates that the proposed method is effective and efficient for initializing categorical data.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  Joshua Zhexue Huang,et al.  A New Initialization Method for Clustering Categorical Data , 2007, PAKDD.

[3]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[4]  Wang Yaonan New initialization method for cluster center , 2010 .

[5]  Patricio A. Vela,et al.  A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm , 2012, Expert Syst. Appl..

[6]  Prasanta K. Jana,et al.  Initialization for K-means Clustering using Voronoi Diagram , 2012 .

[7]  Yi Li,et al.  COOLCAT: an entropy-based algorithm for categorical clustering , 2002, CIKM '02.

[8]  Anima Naik,et al.  Improvement of Initial Cluster Center of C-means using Teaching Learning based Optimization☆ , 2012 .

[9]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[10]  Jiye Liang,et al.  An initialization method for the K-Means algorithm using neighborhood model , 2009, Comput. Math. Appl..

[11]  Stephen J. Redmond,et al.  A method for initialising the K-means clustering algorithm using kd-trees , 2007, Pattern Recognit. Lett..

[12]  Jiye Liang,et al.  A new initialization method for categorical data clustering , 2009, Expert Syst. Appl..

[13]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[14]  Xuelong Li,et al.  Initialization Independent Clustering With Actively Self-Training Method , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[15]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[16]  Jiye Liang,et al.  An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data , 2011, Knowl. Based Syst..

[17]  Zhexue Huang,et al.  CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES , 1997 .

[18]  Michael K. Ng,et al.  An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[19]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[20]  Zhang Liangpei Initialization methods for remote sensing image clustering using K-means algorithm , 2010 .

[21]  Ranjan Maitra Initializing Partition-Optimization Algorithms , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Zhengxin Chen,et al.  An iterative initial-points refinement algorithm for categorical data clustering , 2002, Pattern Recognit. Lett..

[23]  Jiye Liang,et al.  A cluster centers initialization method for clustering categorical data , 2012, Expert Syst. Appl..

[24]  S. R. Kannan,et al.  Effective fuzzy c-means clustering algorithms for data clustering problems , 2012, Expert Syst. Appl..