An Empirical Seed Initialization Idea for K-Means Algorithm Inspired by CLIQUE Algorithm

The K-means clustering algorithm gives good clustering results when the initial centroids or seeds are chosen close to actual cluster centers. In this paper an empirical centroid initialization technique has been presented for the K-means clustering algorithm. The technique is inspired by a well-known grid-based clustering algorithm called 'CLIQUE.' The technique evaluates the initial centroids by evaluating validity indices to check their vicinity from actual cluster centers. Experiments on datasets show that the technique obtains good initial centroids for the datasets compared to state-of-the-art K-means++ clustering algorithm.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[3]  Ujjwal Maulik,et al.  Genetic clustering for automatic evolution of clusters and application to image classification , 2002, Pattern Recognit..

[4]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[5]  Pasi Fränti,et al.  WB-index: A sum-of-squares based index for cluster validity , 2014, Data Knowl. Eng..

[6]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[7]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[8]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[9]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  S. Bandyopadhyay,et al.  Nonparametric genetic clustering: comparison of validity indices , 2001, IEEE Trans. Syst. Man Cybern. Syst..

[11]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[12]  Enrique H. Ruspini,et al.  Numerical methods for fuzzy clustering , 1970, Inf. Sci..

[13]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[14]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[15]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[16]  Sanghamitra Bandyopadhyay,et al.  Classification and learning using genetic algorithms - applications in bioinformatics and web intelligence , 2007, Natural computing series.

[17]  Ujjwal Maulik,et al.  Validity index for crisp and fuzzy clusters , 2004, Pattern Recognit..

[18]  G. Glatting,et al.  Comparison of five cluster validity indices performance in brain [18F]FET‐PET image segmentation using k‐means , 2017, Medical physics.