论文信息 - Tight lower bound instances for k-means++ in two dimensions

Tight lower bound instances for k-means++ in two dimensions

The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial k centers when using the Lloyd's algorithm for the k-means problem. It was conjectured by Brunsch and Roglin 9 that k-means++ behaves well for datasets with small dimension. More specifically, they conjectured that the k-means++ seeding algorithm gives O ( log ? d ) approximation with high probability for any d-dimensional dataset. In this work, we refute this conjecture by giving two dimensional datasets on which the k-means++ seeding algorithm achieves an O ( log ? k ) approximation ratio with probability exponentially small in k. This solves open problems posed by Mahajan et al. 12 and by Brunsch and Roglin 9.

Nir Ailon | Ragesh Jaiswal | Anup Bhattacharya

[1] Nir Ailon,et al. Streaming k-means approximation , 2009, NIPS.

[2] Sergei Vassilvitskii,et al. Scalable K-Means++ , 2012, Proc. VLDB Endow..

[3] Sergei Vassilvitskii,et al. Worst-Case and Smoothed Analysis of the ICP Algorithm, with an Application to the k-Means Method , 2009, SIAM J. Comput..

[4] Manu Agarwal,et al. k-Means++ under approximation stability , 2015, Theor. Comput. Sci..

[5] Johannes Blömer,et al. Bregman Clustering for Separable Instances , 2010, SWAT.

[6] Ankit Aggarwal,et al. Adaptive Sampling for k-Means Clustering , 2009, APPROX-RANDOM.

[7] Klaus Jansen,et al. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques , 2012, Lecture Notes in Computer Science.

[8] Heiko Röglin,et al. A bad instance for k-means++ , 2013, Theor. Comput. Sci..

[9] Sergei Vassilvitskii,et al. How slow is the k-means method? , 2006, SCG '06.

[10] Meena Mahajan,et al. The Planar k-means Problem is NP-hard I , 2009 .

[11] S. Dasgupta. The hardness of k-means clustering , 2008 .

[12] Moni Naor,et al. Theory and Applications of Models of Computation , 2015, Lecture Notes in Computer Science.

[13] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.

[14] Nitin Garg,et al. Analysis of k-Means++ for Separable Data , 2012, APPROX-RANDOM.