Inapproximability of Clustering in Lp Metrics

Proving hardness of approximation for min-sum objectives is an infamous challenge. For classic problems such as the Traveling Salesman problem, the Steiner tree problem, or the k-means and k-median problems, the best known inapproximability bounds for L-p metrics of dimension O(log n) remain well below 1.01. In this paper, we take a significant step to improve the hardness of approximation of the k-means problem in various L-p metrics, and more particularly on Manhattan (L-1), Euclidean (L-2), Hamming (L-0) and Chebyshev (L-infinity) metrics of dimension log n and above. We show that it is hard to approximate the k-means objective in O(log n) dimensional space: (1) To a factor of 3.94 in the L-infinity metric when centers have to be chosen from a discrete set of locations (i.e., the discrete case). This improves upon the result of Guruswami and Indyk (SODA'03) who proved hardness of approximation for a factor less than 1.01. (2) To a factor of 1.56 in the L-1 metric and to a factor of 1.17 in the L-2 metric, both in the discrete case. This improves upon the result of Trevisan (SICOMP'00) who proved hardness of approximation for a factor less than 1.01 in both the metrics. (3) To a factor of 1.07 in the L-2 metric, when centers can be placed at arbitrary locations, (i.e., the continuous case). This improves on a result of Lee-Schmidt-Wright (IPL'17) who proved hardness of approximation for a factor of 1.0013. We also obtain similar improvements over the state of the art hardness of approximation results for the k-median objective in various L-p metrics. Our hardness result given in (1) above, is under the standard NP is not equal to P assumption, whereas all the remaining results given above are under the Unique Games Conjecture (UGC). We can remove our reliance on UGC and prove standard NP-hardness for the above problems but for smaller approximation factors. Finally, we note that in order to obtain our result for the L-1 and L-infinity metrics in O(log n) dimensional space we introduce an embedding technique which combines the transcripts of certain communication protocols with the geometric realization of certain graphs.

[1]  Ola Svensson,et al.  Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[2]  Lijie Chen,et al.  On The Hardness of Approximate and Exact (Bichromatic) Maximum Inner Product , 2018, Electron. Colloquium Comput. Complex..

[3]  Euiwoong Lee,et al.  Improved and simplified inapproximability for k-means , 2015, Inf. Process. Lett..

[4]  Philip N. Klein,et al.  Local Search Yields Approximation Schemes for k-Means and k-Median in Euclidean and Minor-Free Metrics , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[5]  Aviad Rubinstein,et al.  Hardness of approximate nearest neighbor search , 2018, STOC.

[6]  Amit Kumar,et al.  A simple linear time (1 + /spl epsiv/)-approximation algorithm for k-means clustering in any dimensions , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[7]  S. Dasgupta The hardness of k-means clustering , 2008 .

[8]  Mihir Bellare,et al.  Free Bits, PCPs, and Nonapproximability-Towards Tight Results , 1998, SIAM J. Comput..

[9]  A. Razborov Communication Complexity , 2011 .

[10]  Luca Trevisan,et al.  When Hamming Meets Euclid: The Approximability of Geometric TSP and Steiner Tree , 2000, SIAM J. Comput..

[11]  Luca Trevisan,et al.  Non-approximability results for optimization problems on bounded degree instances , 2001, STOC '01.

[12]  Guy Kindler,et al.  On non-optimally expanding sets in Grassmann graphs , 2017, Electron. Colloquium Comput. Complex..

[13]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[14]  Hiroshi Maehara Dispersed points and geometric embedding of complete bipartite graphs , 1991, Discret. Comput. Geom..

[15]  Pravesh Kothari,et al.  Small-Set Expansion in Shortcode Graph and the 2-to-2 Conjecture , 2018, Electron. Colloquium Comput. Complex..

[16]  Mohammad R. Salavatipour,et al.  Local Search Yields a PTAS for k-Means in Doubling Metrics , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[17]  Vincent Cohen-Addad,et al.  A Fast Approximation Scheme for Low-Dimensional k-Means , 2017, SODA.

[18]  Hiroshi Maehara Contact patterns of equal nonoverlapping spheres , 1985, Graphs Comb..

[19]  Aravind Srinivasan,et al.  An Improved Approximation for k-Median and Positive Correlation in Budgeted Optimization , 2014, SODA.

[20]  Andrew Chi-Chih Yao,et al.  Some complexity questions related to distributive computing(Preliminary Report) , 1979, STOC.

[21]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[22]  Alan Roytman,et al.  The Bane of Low-Dimensionality Clustering , 2017, SODA.

[23]  Subhash Khot,et al.  On independent sets, 2-to-2 games, and Grassmann graphs , 2017, Electron. Colloquium Comput. Complex..

[24]  David Saulpic,et al.  Near-Linear Time Approximation Schemes for Clustering in Doubling Metrics , 2018, J. ACM.

[25]  Per Austrin,et al.  Global Cardinality Constraints Make Approximating Some Max-2-CSPs Harder , 2019, APPROX-RANDOM.

[26]  Ravishankar Krishnaswamy,et al.  The Hardness of Approximation of Euclidean k-Means , 2015, SoCG.

[27]  Nimrod Megiddo,et al.  On the Complexity of Some Common Geometric Location Problems , 1984, SIAM J. Comput..

[28]  Kenneth W. Shum,et al.  A low-complexity algorithm for the construction of algebraic-geometric codes better than the Gilbert-Varshamov bound , 2001, IEEE Trans. Inf. Theory.

[29]  Peter Frankl,et al.  Embedding the n-cube in Lower Dimensions , 1986, Eur. J. Comb..

[30]  Bundit Laekhanukit,et al.  On the Complexity of Closest Pair via Polar-Pair of Point-Sets , 2016, SoCG.

[31]  Richard Ryan Williams,et al.  Distributed PCP Theorems for Hardness of Approximation in P , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[32]  Pasin Manurangsi,et al.  On Closest Pair in Euclidean Metric: Monochromatic is as Hard as Bichromatic , 2018, Combinatorica.

[33]  Marek Karpinski,et al.  Approximation schemes for clustering problems , 2003, STOC '03.

[34]  E. Gilbert A comparison of signalling alphabets , 1952 .

[35]  Peter Frankl,et al.  On the contact dimensions of graphs , 1988, Discret. Comput. Geom..

[36]  Amit Kumar,et al.  Tight FPT Approximations for $k$-Median and k-Means , 2019, ICALP.

[37]  J. Pach Decomposition of multiple packing and covering , 1980 .

[38]  Ryan O'Donnell,et al.  Derandomized dimensionality reduction with applications , 2002, SODA '02.

[39]  Samir Khuller,et al.  Greedy strikes back: improved facility location algorithms , 1998, SODA '98.

[40]  Pasin Manurangsi,et al.  A Note on Max k-Vertex Cover: Faster FPT-AS, Smaller Approximate Kernel and Improved Approximation , 2018, SOSA.

[41]  Subhash Khot,et al.  Inapproximability of Vertex Cover and Independent Set in Bounded Degree Graphs , 2009, 2009 24th Annual IEEE Conference on Computational Complexity.

[42]  H. Stichtenoth,et al.  On the Asymptotic Behaviour of Some Towers of Function Fields over Finite Fields , 1996 .

[43]  Pasin Manurangsi,et al.  On the parameterized complexity of approximating dominating set , 2017, Electron. Colloquium Comput. Complex..

[44]  Subhash Khot,et al.  UG-hardness to NP-hardness by losing half , 2019, Electron. Colloquium Comput. Complex..

[45]  Guy Kindler,et al.  Towards a proof of the 2-to-1 games conjecture? , 2018, Electron. Colloquium Comput. Complex..

[46]  Subhash Khot,et al.  Pseudorandom Sets in Grassmann Graph Have Near-Perfect Expansion , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).