The planar k-means problem is NP-hard

In the k-means problem, we are given a finite set S of points in @?^m, and integer k>=1, and we want to find k points (centers) so as to minimize the sum of the square of the Euclidean distance of each point in S to its nearest center. We show that this well-known problem is NP-hard even for instances in the plane, answering an open question posed by Dasgupta (2007) [7].

[1]  Nimrod Megiddo,et al.  On the Complexity of Some Common Geometric Location Problems , 1984, SIAM J. Comput..

[2]  Alan M. Frieze,et al.  Clustering Large Graphs via the Singular Value Decomposition , 2004, Machine Learning.

[3]  Sergei Vassilvitskii,et al.  Worst-case and Smoothed Analysis of the ICP Algorithm, with an Application to the k-means Method , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[4]  László Lovász,et al.  Algorithmic theory of numbers, graphs and convexity , 1986, CBMS-NSF regional conference series in applied mathematics.

[5]  David Lichtenstein,et al.  Planar Formulae and Their Uses , 1982, SIAM J. Comput..

[6]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[7]  Leslie G. Valiant,et al.  Universality considerations in VLSI circuits , 1981, IEEE Transactions on Computers.

[8]  Eric Allender,et al.  The Directed Planar Reachability Problem , 2005, FSTTCS.

[9]  Sergei Vassilvitskii,et al.  How slow is the k-means method? , 2006, SCG '06.

[10]  Charles E. Leiserson,et al.  Area-efficient graph layouts , 1980, 21st Annual Symposium on Foundations of Computer Science (sfcs 1980).

[11]  Matt Gibson,et al.  On clustering to minimize the sum of radii , 2008, SODA '08.

[12]  Charles E. Leiserson,et al.  Area-Efficient Graph Layouts (for VLSI) , 1980, FOCS.

[13]  Sariel Har-Peled,et al.  How Fast Is the k-Means Method? , 2005, SODA '05.

[14]  Marek Karpinski,et al.  Approximation schemes for clustering problems , 2003, STOC '03.

[15]  Steven A. Orszag,et al.  CBMS-NSF REGIONAL CONFERENCE SERIES IN APPLIED MATHEMATICS , 1978 .

[16]  Sanjeev Arora,et al.  Polynomial time approximation schemes for Euclidean TSP and other geometric problems , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[17]  Rafail Ostrovsky,et al.  The Effectiveness of Lloyd-Type Methods for the k-Means Problem , 2006, FOCS.

[18]  Mary Inaba,et al.  Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract) , 1994, SCG '94.

[19]  S. Dasgupta The hardness of k-means clustering , 2008 .

[20]  Amit Kumar,et al.  A simple linear time (1 + /spl epsiv/)-approximation algorithm for k-means clustering in any dimensions , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[21]  David M. Mount,et al.  A local search approximation algorithm for k-means clustering , 2002, SCG '02.

[22]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[23]  Pierre Hansen,et al.  NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.