Johnson Coverage Hypothesis: Inapproximability of k-means and k-median in L_p metrics

k-median and k-means are the two most popular objectives for clustering algorithms. Despite intensive effort, a good understanding of the approximability of these objectives, particularly in lp-metrics, remains a major open problem. In this paper, we significantly improve upon the hardness of approximation factors known in literature for these objectives in lp-metrics. We introduce a new hypothesis called the Johnson Coverage Hypothesis (JCH), which roughly asserts that the well-studied Max k-Coverage problem on set systems is hard to approximate to a factor greater than (1 − 1/e), even when the membership graph of the set system is a subgraph of the Johnson graph. We then show that together with generalizations of the embedding techniques introduced by Cohen-Addad and Karthik (FOCS ’19), JCH implies hardness of approximation results for k-median and k-means in lp-metrics for factors which are close to the ones obtained for general metrics. In particular, assuming JCH we show that it is hard to approximate the k-means objective: • Discrete case: To a factor of 3.94 in the l1-metric and to a factor of 1.73 in the l2-metric; this improves upon the previous factor of 1.56 and 1.17 respectively, obtained under the Unique Games Conjecture (UGC). • Continuous case: To a factor of 2.10 in the l1-metric and to a factor of 1.36 in the l2metric; this improves upon the previous factor of 1.07 in the l2-metric obtained under UGC (and to the best of our knowledge, the continuous case of k-means in l1-metric was not previously analyzed in literature). We also obtain similar improvements under JCH for the k-median objective. Additionally, we prove a weak version of JCH using the work of Dinur et al. (SICOMP ’05) on Hypergraph Vertex Cover, and recover all the results stated above of Cohen-Addad and Karthik (FOCS ’19) to (nearly) the same inapproximability factors but now under the standard NP 6= P assumption (instead of UGC). Finally, we establish a strong connection between JCH and the long standing open problem of determining the Hypergraph Turán number. We then use this connection to prove improved SDP gaps (over the existing factors in literature) for k-means and k-median objectives. *Google Research, Switzerland. vcohenad@gmail.com. †Rutgers University, USA. karthik.cs@rutgers.edu. ‡University of Michigan, USA. euiwoong@umich.edu.

[1]  Johan Håstad,et al.  Some optimal inapproximability results , 2001, JACM.

[2]  Guy Kindler,et al.  Optimal inapproximability results for MAX-CUT and other 2-variable CSPs? , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[3]  Ran Raz,et al.  A parallel repetition theorem , 1995, STOC '95.

[4]  Subhash Khot,et al.  Inapproximability of Vertex Cover and Independent Set in Bounded Degree Graphs , 2009, 2009 24th Annual IEEE Conference on Computational Complexity.

[5]  Hooyeon Lee,et al.  Approximating low-dimensional coverage problems , 2011, SoCG '12.

[6]  Michael Langberg,et al.  A unified framework for approximating and clustering data , 2011, STOC.

[7]  Euiwoong Lee,et al.  Improved and simplified inapproximability for k-means , 2015, Inf. Process. Lett..

[8]  Pasin Manurangsi,et al.  A Note on Max k-Vertex Cover: Faster FPT-AS, Smaller Approximate Kernel and Improved Approximation , 2018, SOSA.

[9]  Subhash Khot,et al.  Pseudorandom Sets in Grassmann Graph Have Near-Perfect Expansion , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[10]  T. Sanders,et al.  Analysis of Boolean Functions , 2012, ArXiv.

[11]  Russell Impagliazzo,et al.  Which problems have strongly exponential complexity? , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[12]  Michael R. Fellows,et al.  Fixed-Parameter Tractability and Completeness II: On Completeness for W[1] , 1995, Theor. Comput. Sci..

[13]  Madhur Tulsiani,et al.  Approximating Constraint Satisfaction Problems on High-Dimensional Expanders , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[14]  Badih Ghazi,et al.  LP/SDP Hierarchy Lower Bounds for Decoding Random LDPC Codes , 2014, IEEE Transactions on Information Theory.

[15]  Amit Kumar,et al.  Linear-time approximation schemes for clustering problems in any dimensions , 2010, JACM.

[16]  Subhash Khot,et al.  UG-hardness to NP-hardness by losing half , 2019, Electron. Colloquium Comput. Complex..

[17]  Jerrold R. Griggs,et al.  Journal of Combinatorial Theory, Series A , 2011 .

[18]  David Saulpic,et al.  Near-Linear Time Approximation Schemes for Clustering in Doubling Metrics , 2018, J. ACM.

[19]  Pasin Manurangsi,et al.  On the parameterized complexity of approximating dominating set , 2017, Electron. Colloquium Comput. Complex..

[20]  Michael T. Goodrich,et al.  Almost optimal set covers in finite VC-dimension , 1995, Discret. Comput. Geom..

[21]  Guy Kindler,et al.  On non-optimally expanding sets in Grassmann graphs , 2017, Electron. Colloquium Comput. Complex..

[22]  Subhash Khot,et al.  On independent sets, 2-to-2 games, and Grassmann graphs , 2017, Electron. Colloquium Comput. Complex..

[23]  Sanjeev Arora,et al.  Probabilistic checking of proofs: a new characterization of NP , 1998, JACM.

[24]  Peter Frankl,et al.  On the contact dimensions of graphs , 1988, Discret. Comput. Geom..

[25]  Alexander Sidorenko,et al.  Upper Bounds for Turán Numbers , 1997, J. Comb. Theory, Ser. A.

[26]  E. Gilbert A comparison of signalling alphabets , 1952 .

[27]  Nisheeth K. Vishnoi,et al.  The Unique Games Conjecture, Integrality Gap for Cut Problems and Embeddability of Negative Type Metrics into l1 , 2005, FOCS.

[28]  Nisheeth K. Vishnoi,et al.  Unique games on expanding constraint graphs are easy: extended abstract , 2008, STOC.

[29]  J. Pach Decomposition of multiple packing and covering , 1980 .

[30]  Dana Moshkovitz,et al.  The Projection Games Conjecture and the NP-Hardness of ln n-Approximating Set-Cover , 2012, Theory Comput..

[31]  Ola Svensson,et al.  Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[32]  Aviad Rubinstein,et al.  Hardness of approximate nearest neighbor search , 2018, STOC.

[33]  Yuval Rabani,et al.  ON THE HARDNESS OF APPROXIMATING MULTICUT AND SPARSEST-CUT , 2005, 20th Annual IEEE Conference on Computational Complexity (CCC'05).

[34]  Hiroshi Maehara Dispersed points and geometric embedding of complete bipartite graphs , 1991, Discret. Comput. Geom..

[35]  Subhash Khot,et al.  Hardness results for coloring 3-colorable 3-uniform hypergraphs , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[36]  Vladimir Nikiforov,et al.  The number of cliques in graphs of given order and size , 2007, 0710.2305.

[37]  Luca Trevisan,et al.  When Hamming Meets Euclid: The Approximability of Geometric TSP and Steiner Tree , 2000, SIAM J. Comput..

[38]  Piotr Indyk,et al.  Approximate clustering via core-sets , 2002, STOC '02.

[39]  Hiroshi Maehara Contact patterns of equal nonoverlapping spheres , 1985, Graphs Comb..

[40]  Russell Impagliazzo,et al.  Complexity of k-SAT , 1999, Proceedings. Fourteenth Annual IEEE Conference on Computational Complexity (Formerly: Structure in Complexity Theory Conference) (Cat.No.99CB36317).

[41]  Kenneth W. Shum,et al.  A low-complexity algorithm for the construction of algebraic-geometric codes better than the Gilbert-Varshamov bound , 2001, IEEE Trans. Inf. Theory.

[42]  Pravesh Kothari,et al.  Small-Set Expansion in Shortcode Graph and the 2-to-2 Conjecture , 2018, Electron. Colloquium Comput. Complex..

[43]  Ravishankar Krishnaswamy,et al.  The Hardness of Approximation of Euclidean k-Means , 2015, SoCG.

[44]  Pasin Manurangsi Tight Running Time Lower Bounds for Strong Inapproximability of Maximum k-Coverage, Unique Set Cover and Related Problems (via t-Wise Agreement Testing Theorem) , 2020, SODA.

[45]  Alexander Sidorenko,et al.  What we know and what we do not know about Turán numbers , 1995, Graphs Comb..

[46]  Carsten Lund,et al.  Proof verification and hardness of approximation problems , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[47]  R. A. R. A Z B O R O V On the minimal density of triangles in graphs , 2008 .

[48]  Guy Kindler,et al.  Towards a proof of the 2-to-1 games conjecture? , 2018, Electron. Colloquium Comput. Complex..

[49]  Subhash Khot,et al.  Vertex cover might be hard to approximate to within 2-/spl epsiv/ , 2003, 18th IEEE Annual Conference on Computational Complexity, 2003. Proceedings..

[50]  Samir Khuller,et al.  Greedy strikes back: improved facility location algorithms , 1998, SODA '98.

[51]  David Steurer,et al.  Analytical approach to parallel repetition , 2013, STOC.

[52]  An Algorithmic Study of the Hypergraph Turán Problem , 2020, ArXiv.

[53]  Vincent Cohen-Addad,et al.  A Fast Approximation Scheme for Low-Dimensional k-Means , 2017, SODA.

[54]  Satish Rao,et al.  Approximation schemes for Euclidean k-medians and related problems , 1998, STOC '98.

[55]  H. Stichtenoth,et al.  On the Asymptotic Behaviour of Some Towers of Function Fields over Finite Fields , 1996 .

[56]  Carsten Lund,et al.  On the hardness of approximating minimization problems , 1994, JACM.

[57]  Venkatesan Guruswami,et al.  A New Multilayered PCP and the Hardness of Hypergraph Vertex Cover , 2005, SIAM J. Comput..

[58]  Julia Chuzhoy,et al.  On Approximating Maximum Independent Set of Rectangles , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[59]  Venkatesan Guruswami,et al.  Embeddings and non-approximability of geometric problems , 2003, SODA '03.

[60]  Christian Reiher,et al.  The clique density theorem , 2012, 1212.2454.

[61]  Pasin Manurangsi,et al.  On Closest Pair in Euclidean Metric: Monochromatic is as Hard as Bichromatic , 2018, Combinatorica.

[62]  Venkatesan Guruswami,et al.  Every Permutation CSP of arity 3 is Approximation Resistant , 2009, 2009 24th Annual IEEE Conference on Computational Complexity.

[63]  Irit Dinur,et al.  The PCP theorem by gap amplification , 2006, STOC.

[64]  Amin Saberi,et al.  A new greedy approach for facility location problems , 2002, STOC '02.

[65]  József Dénes,et al.  Research problems , 1980, Eur. J. Comb..

[66]  Amit Kumar,et al.  Tight FPT Approximations for $k$-Median and k-Means , 2019, ICALP.

[67]  Euiwoong Lee,et al.  On Approximability of Clustering Problems Without Candidate Centers , 2020, SODA.

[68]  Ragesh Jaiswal,et al.  Hardness of Approximation of Euclidean k-Median , 2020, APPROX-RANDOM.

[69]  Bundit Laekhanukit,et al.  On the Complexity of Closest Pair via Polar-Pair of Point-Sets , 2016, SoCG.

[70]  Aravind Srinivasan,et al.  An Improved Approximation for k-Median and Positive Correlation in Budgeted Optimization , 2014, SODA.

[71]  Per Austrin,et al.  Global Cardinality Constraints Make Approximating Some Max-2-CSPs Harder , 2019, APPROX-RANDOM.