k-means Requires Exponentially Many Iterations Even in the Plane

The k-means algorithm is a well-known method for partitioning n points that lie in the d-dimensional space into k clusters. Its main features are simplicity and speed in practice. Theoretically, however, the best known upper bound on its running time (i.e. O(nkd)) is, in general, exponential in the number of points (when kd=Ω(n log n)). Recently, Arthur and Vassilvitskii [2] showed a super-polynomial worst-case analysis, improving the best known lower bound from Ω(n) to 2Ω(√n) with a construction in d=Ω(√n) dimensions. In [2] they also conjectured the existence of super-polynomial lower bounds for any d≥ 2. Our contribution is twofold: we prove this conjecture and we improve the lower bound, by presenting a simple construction in the plane that leads to the exponential lower bound 2Ω(n).

[1]  Nabil H. Mustafa,et al.  k-means projective clustering , 2004, PODS.

[2]  Sergei Vassilvitskii,et al.  How slow is the k-means method? , 2006, SCG '06.

[3]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[4]  Sanjoy Dasgupta How Fast Is k-Means? , 2003, COLT.

[5]  Shang-Hua Teng,et al.  Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time , 2001, STOC '01.

[6]  David G. Stork,et al.  Pattern Classification , 1973 .

[7]  Sergei Vassilvitskii,et al.  Worst-case and Smoothed Analysis of the ICP Algorithm, with an Application to the k-means Method , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[8]  F. Gibou A fast hybrid k-means level set algorithm for segmentation , 2005 .

[9]  Hiroshi Imai,et al.  Variance-based K-clustering algorithms by Voronoi Diagrams and randomization , 2000 .

[10]  Andrea Vattani k-means Requires Exponentially Many Iterations Even in the Plane , 2011, Discret. Comput. Geom..

[11]  Sariel Har-Peled,et al.  How Fast Is the k-Means Method? , 2005, SODA '05.

[12]  Bodo Manthey,et al.  k-Means Has Polynomial Smoothed Complexity , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[13]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[14]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[15]  Bodo Manthey,et al.  Improved smoothed analysis of the k-means method , 2009, SODA.

[16]  M MountDavid,et al.  A local search approximation algorithm for k-means clustering , 2004 .

[17]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[18]  David M. Mount,et al.  A local search approximation algorithm for k-means clustering , 2002, SCG '02.

[19]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.