Attribute weights-based clustering centres algorithm for initialising K-modes clustering

The K-modes algorithm based on partitional clustering technology is a very popular and effective clustering method; moreover, it handles categorical data. However, the performance of the K-modes method is largely affected by the initial clustering centres. Random selection of the initial clustering centres commonly leads to non-repeatable clustering result. Hence, suitable choice of the initial clustering centres is crucial to realizing high-performance K-modes clustering. The present article develops an initialisation algorithm for K-modes. At initialisation, the distance between two instances calculated after weighting the attributes of the instances. Many studies have shown that if clustering is based only on distances or density between the instances, the clustering revolves around one centre or the outliers. Therefore, based on the attribute weights, we combine the distance and density measures to select the clustering centres. In experiments on several UCI machine learning repository benchmark datasets, the new initialisation method outperformed the existing K-modes clustering methods.

[1]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Jiri Matas,et al.  Spatial and Feature Space Clustering: Applications in Image Analysis , 1995, CAIP.

[3]  Zhengxin Chen,et al.  An iterative initial-points refinement algorithm for categorical data clustering , 2002, Pattern Recognit. Lett..

[4]  Jiye Liang,et al.  A cluster centers initialization method for clustering categorical data , 2012, Expert Syst. Appl..

[5]  Yacine Challal,et al.  Efficient and Privacy-Preserving k-Means Clustering for Big Data Mining , 2016, 2016 IEEE Trustcom/BigDataSE/ISPA.

[6]  Anand Singh Jalal,et al.  A Density Based Algorithm for Discovering Density Varied Clusters in Large Spatial Databases , 2010 .

[7]  Chung-Chian Hsu,et al.  Incremental clustering of mixed data based on distance hierarchy , 2008, Expert Syst. Appl..

[8]  Chen Yan,et al.  Initialization of k-modes clustering for categorical data , 2013, 2013 International Conference on Management Science and Engineering 20th Annual Conference Proceedings.

[9]  Yi Li,et al.  COOLCAT: an entropy-based algorithm for categorical clustering , 2002, CIKM '02.

[10]  Jiye Liang,et al.  Fast global k-means clustering based on local geometrical information , 2013, Inf. Sci..

[11]  Ashok Kumar Das,et al.  An Efficient Hybrid Anomaly Detection Scheme Using K-Means Clustering for Wireless Sensor Networks , 2016, Wirel. Pers. Commun..

[12]  Joshua Zhexue Huang,et al.  A New Initialization Method for Clustering Categorical Data , 2007, PAKDD.

[13]  Qinghua Hu,et al.  Data compression with homomorphism in covering information systems , 2011, Int. J. Approx. Reason..

[14]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[15]  Shehroz S. Khan,et al.  Cluster center initialization algorithm for K-modes clustering , 2013, Expert Syst. Appl..

[16]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[17]  Vadlana Baby,et al.  Distributed threshold k-means clustering for privacy preserving data mining , 2016, 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[18]  Jiye Liang,et al.  A new initialization method for categorical data clustering , 2009, Expert Syst. Appl..

[19]  Feng Jiang,et al.  Initialization of K-modes clustering using outlier detection techniques , 2016, Inf. Sci..

[20]  Sina Khanmohammadi,et al.  An improved overlapping k-means clustering method for medical applications , 2017, Expert Syst. Appl..

[21]  Ivo Düntsch,et al.  Uncertainty Measures of Rough Set Prediction , 1998, Artif. Intell..

[22]  Zhou,et al.  A Global K-modes Algorithm for Clustering Categorical Data ∗ , 2012 .

[23]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[24]  Joshua Zhexue Huang,et al.  A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining , 1997, DMKD.

[25]  Jiye Liang,et al.  An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data , 2011, Knowl. Based Syst..

[26]  Prerna Mahajan,et al.  Rough Set Approach in Machine Learning: A Review , 2012 .

[27]  Shehroz S. Khan,et al.  Cluster center initialization algorithm for K-means clustering , 2004, Pattern Recognit. Lett..

[28]  Mohamed Saleh,et al.  K-modes and Entropy Cluster Centers Initialization Methods , 2017, ICORES.