K-Harmonic means type clustering algorithm for mixed datasets

Display Omitted A K-Harmonic clustering algorithm for mixed data has been presented to reduce random initialization problem for partitional algorithms.The proposed clustering algorithm uses a distance measure developed for mixed datasets.The experiment results suggest that clustering results are quite insensitive to random initialization.The proposed algorithm performed better than other clustering algorithms for various datasets. K-means type clustering algorithms for mixed data that consists of numeric and categorical attributes suffer from cluster center initialization problem. The final clustering results depend upon the initial cluster centers. Random cluster center initialization is a popular initialization technique. However, clustering results are not consistent with different cluster center initializations. K-Harmonic means clustering algorithm tries to overcome this problem for pure numeric data. In this paper, we extend the K-Harmonic means clustering algorithm for mixed datasets. We propose a definition for a cluster center and a distance measure. These cluster centers and the distance measure are used with the cost function of K-Harmonic means clustering algorithm in the proposed algorithm. Experiments were carried out with pure categorical datasets and mixed datasets. Results suggest that the proposed clustering algorithm is quite insensitive to the cluster center initialization problem. Comparative studies with other clustering algorithms show that the proposed algorithm produce better clustering results.

[1]  Chunguang Zhou,et al.  An improved k-prototypes clustering algorithm for mixed numeric and categorical data , 2013, Neurocomputing.

[2]  Michael K. Ng,et al.  Automated variable weighting in k-means type clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Wei-Chang Yeh,et al.  A novel hybrid clustering approach based on K-harmonic means using robust design , 2016, Neurocomputing.

[4]  Lipika Dey,et al.  Algorithm for Fuzzy Clustering of Mixed Data with Numeric and Categorical Attributes , 2005, ICDCIT.

[5]  Shehroz S. Khan,et al.  Cluster center initialization algorithm for K-modes clustering , 2013, Expert Syst. Appl..

[6]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[7]  Zhengxin Chen,et al.  An iterative initial-points refinement algorithm for categorical data clustering , 2002, Pattern Recognit. Lett..

[8]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[9]  Gautam Biswas,et al.  Unsupervised Learning with Mixed Numeric and Nominal Data , 2002, IEEE Trans. Knowl. Data Eng..

[10]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[11]  Murat Erisoglu,et al.  A new algorithm for initial cluster centers in k-means algorithm , 2011, Pattern Recognit. Lett..

[12]  Jiye Liang,et al.  A new initialization method for categorical data clustering , 2009, Expert Syst. Appl..

[13]  Joshua Zhexue Huang,et al.  A New Initialization Method for Clustering Categorical Data , 2007, PAKDD.

[14]  Zhiqiang Ma,et al.  An Initialization Method for Clustering Mixed Numeric and Categorical Data Based on the Density and Distance , 2015, Int. J. Pattern Recognit. Artif. Intell..

[15]  Lipika Dey,et al.  A k-mean clustering algorithm for mixed numeric and categorical data , 2007, Data Knowl. Eng..

[16]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[17]  Lipika Dey,et al.  A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets , 2011, Pattern Recognit. Lett..

[18]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[19]  Joshua Zhexue Huang,et al.  A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining , 1997, DMKD.

[20]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[21]  Shehroz S. Khan,et al.  Cluster center initialization algorithm for K-means clustering , 2004, Pattern Recognit. Lett..

[22]  Zhe Wang,et al.  A novel cluster center initialization method for the k-prototypes algorithms using centrality and distance , 2015 .

[23]  Xiao Han,et al.  A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data , 2012, Knowl. Based Syst..

[24]  Zhexue Huang,et al.  CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES , 1997 .

[25]  S. Deelers,et al.  Enhancing K-Means Algorithm with Initial Cluster Centers Derived from Data Partitioning along the Data Axis with the Highest Variance , 2007 .

[26]  Manuel García-Magariños,et al.  A framework for dissimilarity-based partitioning clustering of categorical time series , 2014, Data Mining and Knowledge Discovery.

[27]  Jun Sun,et al.  A hybrid fuzzy K-harmonic means clustering algorithm , 2015 .

[28]  Doheon Lee,et al.  Fuzzy clustering of categorical data using fuzzy centroids , 2004, Pattern Recognit. Lett..

[29]  Zengyou He,et al.  Farthest-Point Heuristic based Initialization Methods for K-Modes Clustering , 2006, ArXiv.