论文信息 - Faster K-Means Cluster Estimation

Faster K-Means Cluster Estimation

There has been considerable work on improving popular clustering algorithm ‘K-means’ in terms of mean squared error (MSE) and speed, both. However, most of the k-means variants tend to compute distance of each data point to each cluster centroid for every iteration. We propose a fast heuristic to overcome this bottleneck with only marginal increase in MSE. We observe that across all iterations of K-means, a data point changes its membership only among a small subset of clusters. Our heuristic predicts such clusters for each data point by looking at nearby clusters after the first iteration of k-means. We augment well known variants of k-means with our heuristic to demonstrate effectiveness of our heuristic. For various synthetic and real-world datasets, our heuristic achieves speed-up of up-to 3 times when compared to efficient variants of k-means.

Siddhesh Khandelwal | Amit Awekar | Siddhesh Khandelwal | Amit Awekar

[1] Charles Elkan,et al. Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[2] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.

[3] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[4] Abdel-Badeeh M. Salem,et al. An efficient enhanced k-means clustering algorithm , 2006 .

[5] Andrew W. Moore,et al. Accelerating exact k-means algorithms with geometric reasoning , 1999, KDD '99.

[6] Nikos A. Vlassis,et al. The global k-means clustering algorithm , 2003, Pattern Recognit..

[7] D. Pham,et al. Selection of K in K-means clustering , 2005 .

[8] D.M. Mount,et al. An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..