How Fast Is the k-Means Method?

Abstract We present polynomial upper and lower bounds on the number of iterations performed by the k-means method (a.k.a. Lloyd’s method) for k-means clustering. Our upper bounds are polynomial in the number of points, number of clusters, and the spread of the point set. We also present a lower bound, showing that in the worst case the k-means heuristic needs to perform Ω(n) iterations, for n points on the real line and two centers. Surprisingly, the spread of the point set in this construction is polynomial. This is the first construction showing that the k-means heuristic requires more than a polylogarithmic number of iterations. Furthermore, we present two alternative algorithms, with guaranteed performance, which are simple variants of the k-means method. Results of our experimental studies on these algorithms are also presented.

[1]  Amit Kumar,et al.  A simple linear time (1 + /spl epsiv/)-approximation algorithm for k-means clustering in any dimensions , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[2]  Sanjoy Dasgupta How Fast Is k-Means? , 2003, COLT.

[3]  Mary Inaba,et al.  Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract) , 1994, SCG '94.

[4]  Jiri Matousek,et al.  Lectures on discrete geometry , 2002, Graduate texts in mathematics.

[5]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[6]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[7]  Sanjeev Arora,et al.  Polynomial time approximation schemes for Euclidean TSP and other geometric problems , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[8]  Michelle Effros,et al.  Deterministic clustering with data nets , 2004, Electron. Colloquium Comput. Complex..

[9]  Qiang Du,et al.  Centroidal Voronoi Tessellations: Applications and Algorithms , 1999, SIAM Rev..

[10]  Sariel Har-Peled,et al.  Coresets for $k$-Means and $k$-Median Clustering and their Applications , 2018, STOC 2004.

[11]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[12]  David G. Stork,et al.  Pattern Classification , 1973 .

[13]  David M. Mount,et al.  A local search approximation algorithm for k-means clustering , 2002, SCG '02.

[14]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[15]  Marek Karpinski,et al.  Approximation schemes for clustering problems , 2003, STOC '03.

[16]  Amit Kumar,et al.  A simple linear time ( 1+ ε)- approximation algorithm for geometric k-means clustering in any dimensions , 2004 .

[17]  Jirí Matousek,et al.  On Approximate Geometric k -Clustering , 2000, Discret. Comput. Geom..

[18]  D. Pollard Strong Consistency of $K$-Means Clustering , 1981 .