Density K-means: A new algorithm for centers initialization for K-means

K-means is one of the most significant clustering algorithms in data mining. It performs well in many cases, especially in the massive data sets. However, the result of clustering by K-means largely depends upon the initial centers, which makes K-means difficult to reach global optimum. In this paper, we developed a novel algorithm based on finding density peaks to optimize the initial centers for K-means. In the experiment, together with our algorithm, nine different clustering algorithms were extensively compared on four well-known test data sets. According to our experimental results, the performance of our algorithm is significantly better than other eight algorithms, which indicates that it is a valuable method to select initial center for K-means.

[1]  Rongrong Ji,et al.  Nonnegative Spectral Clustering with Discriminative Regularization , 2011, AAAI.

[2]  R. Garfein,et al.  Effect of a cash transfer programme for schooling on prevalence of HIV and herpes simplex type 2 in Malawi: a cluster randomised trial , 2012, The Lancet.

[3]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[4]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[5]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[6]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[7]  M. P. Sebastian,et al.  Improving the Accuracy and Efficiency of the k-means Clustering Algorithm , 2009 .

[8]  Ali Ridho Barakbah,et al.  Hierarchical K-means: an algorithm for centroids initialization for K-means , 2007 .

[9]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[10]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[11]  Pedro Larrañaga,et al.  An empirical comparison of four initialization methods for the K-Means algorithm , 1999, Pattern Recognit. Lett..

[12]  Yiu-ming Cheung,et al.  k*-Means: A new generalized k-means clustering algorithm , 2003, Pattern Recognit. Lett..

[13]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[14]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[15]  P. Robert-Jan Simons,et al.  Patterns of interactive media use among contemporary youth , 2011, J. Comput. Assist. Learn..

[16]  Shehroz S. Khan,et al.  Cluster center initialization algorithm for K-means clustering , 2004, Pattern Recognit. Lett..

[17]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[18]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.