K-modes and Entropy Cluster Centers Initialization Methods

Data clustering is an important unsupervised technique in data mining which aims to extract the natural partitions in a dataset without a priori class information. Unfortunately, every clustering model is very sensitive to the set of randomly initialized centers, since such initial clusters directly influence the formation of final clusters. Thus, determining the initial cluster centers is an important issue in clustering models. Previous work has shown that using multiple clustering validity indices in a multiobjective clustering model (e.g., MODEK-Modes model) yields more accurate results than using a single validity index. In this study, we enhance the performance of MODEK-Modes model by introducing two new initialization methods. The two proposed methods are the K-Modes initialization method and the entropy initialization method. The two proposed methods are tested using ten benchmark real life datasets obtained from the UCI Machine Learning Repository. Experimental results show that the two initialization methods achieve significant improvement in the clustering performance compared to other existing initialization methods.

[1]  C.-C. Jay Kuo,et al.  A new initialization technique for generalized Lloyd iteration , 1994, IEEE Signal Processing Letters.

[2]  R. Jancey Multidimensional group analysis , 1966 .

[3]  U. Fayyad Knowledge Discovery and Data Mining: An Overview , 1995 .

[4]  Liang Bai,et al.  A dissimilarity measure for the k-Modes clustering algorithm , 2012, Knowl. Based Syst..

[5]  Stephen J. Redmond,et al.  A method for initialising the K-means clustering algorithm using kd-trees , 2007, Pattern Recognit. Lett..

[6]  Veronica Oliveira de Carvalho,et al.  Combining K-Means and K-Harmonic with Fish School Search Algorithm for data clustering task on graphics processing units , 2016, Appl. Soft Comput..

[7]  S. Fazli,et al.  K-Mean Clustering Method For Analysis Customer Lifetime Value With LRFM Relationship Model In Banking Services , 2012 .

[8]  Jiye Liang,et al.  A new initialization method for categorical data clustering , 2009, Expert Syst. Appl..

[9]  Jiye Liang,et al.  A novel fuzzy clustering algorithm with between-cluster information for categorical data , 2013, Fuzzy Sets Syst..

[10]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[11]  G H Ball,et al.  A clustering technique for summarizing multivariate data. , 1967, Behavioral science.

[12]  Prasad S. Halgaonkar,et al.  Review of Clustering Algorithm for Categorical Data , 2013 .

[13]  Jiye Liang,et al.  A cluster centers initialization method for clustering categorical data , 2012, Expert Syst. Appl..

[14]  Kyoung-jae Kim,et al.  A recommender system using GA K-means clustering in an online shopping market , 2008, Expert Syst. Appl..