On Euclidean $k$-Means Clustering with $\alpha$-Center Proximity

k-means clustering is NP-hard in the worst case but previous work has shown efficient algorithms assuming the optimal k-means clusters are stable under additive or multiplicative perturbation of data. This has two caveats. First, we do not know how to efficiently verify this property of optimal solutions that are NP-hard to compute in the first place. Second, the stability assumptions required for polynomial time k-means algorithms are often unreasonable when compared to the ground-truth clusters in real-world data. A consequence of multiplicative perturbation resilience is center proximity, that is, every point is closer to the center of its own cluster than the center of any other cluster, by some multiplicative factor α > 1. We study the problem of minimizing the Euclidean k-means objective only over clusterings that satisfy α-center proximity. We give a simple algorithm to find the optimal α-center-proximal k-means clustering in running time exponential in k and 1/(α− 1) but linear in the number of points and the dimension. We define an analogous α-center proximity condition for outliers, and give similar algorithmic guarantees for k-means with outliers and α-center proximity. On the hardness side we show that for any α′ > 1, there exists an α 6 α′, (α > 1), and an ε0 > 0 such that minimizing the k-means objective over clusterings that satisfy α-center proximity is NP-hard to approximate within a multiplicative (1 + ε0) factor.

[1]  Mohammad R. Salavatipour,et al.  Exact Algorithms and Lower Bounds for Stable Instances of Euclidean k-Means , 2018, SODA.

[2]  Jinhui Xu,et al.  Sub-linear Time Hybrid Approximations for Least Trimmed Squares Estimator and Related Problems , 2014, Symposium on Computational Geometry.

[3]  Maria-Florina Balcan,et al.  Clustering under Perturbation Resilience , 2011, SIAM J. Comput..

[4]  Matus Telgarsky,et al.  Hartigan's Method: k-means Clustering without Voronoi , 2010, AISTATS.

[5]  Avrim Blum,et al.  Center-based clustering under perturbation stability , 2010, Inf. Process. Lett..

[6]  Shai Ben-David,et al.  Clusterability: A Theoretical Study , 2009, AISTATS.

[7]  Pierre Hansen,et al.  NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[8]  Amit Kumar,et al.  Clustering with Spectral Norm and the k-Means Algorithm , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[9]  Lev Reyzin Data Stability in Clustering: A Closer Look , 2012, ALT.

[10]  Konstantin Makarychev,et al.  Algorithms for stable and perturbation-resilient problems , 2017, STOC.

[11]  Shai Ben-David,et al.  Computational Feasibility of Clustering under Clusterability Assumptions , 2015, ArXiv.

[12]  Meena Mahajan,et al.  The Planar k-means Problem is NP-hard I , 2009 .

[13]  Jinhui Xu,et al.  A Unified Framework for Clustering Constrained Data Without Locality Property , 2015, Algorithmica.

[14]  Sanjoy Dasgupta,et al.  Random projection trees for vector quantization , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[15]  Alex Wang,et al.  Clustering Stable Instances of Euclidean k-means , 2017, NIPS.

[16]  Siddharth Barman,et al.  Approximating Nash Equilibria and Dense Bipartite Subgraphs via an Approximate Version of Caratheodory's Theorem , 2015, STOC.

[17]  Mary Inaba,et al.  Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract) , 1994, SCG '94.

[18]  Ravishankar Krishnaswamy,et al.  The Hardness of Approximation of Euclidean k-Means , 2015, SoCG.