The complexity of LSH feasibility

In this paper we study the complexity of the following feasibility problem: given an n×n similarity matrix S as input, is there a locality sensitive hash (LSH) for S? We show that the LSH feasibility problem is NP-hard even in the following strong promise version: either S admits an LSH or S is at l1-distance at least n2−ϵ from every similarity that admits an LSH. We complement this hardness result by providing an O˜(3n) algorithm for the LSH feasibility problem, which improves upon the naive nΘ(n) time algorithm; we prove that this running time is tight, modulo constants, under the Exponential Time Hypothesis.

[1]  David Avis,et al.  The cut cone, L1 embeddability, complexity, and multicommodity flows , 1991, Networks.

[2]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[3]  Krzysztof Onak,et al.  Testing Properties of Sets of Points in Metric Spaces , 2008, ICALP.

[4]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[5]  Alan J. Lee,et al.  Generating Random Binary Deviates Having Fixed Marginal Distributions and Specified Degrees of Association , 1993 .

[6]  Ravi Kumar,et al.  LSH-Preserving Functions and Their Applications , 2012, SODA.

[7]  C. Carathéodory Über den variabilitätsbereich der fourier’schen konstanten von positiven harmonischen funktionen , 1911 .

[8]  Alan M. Frieze,et al.  Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..

[9]  Piotr Indyk,et al.  Low-distortion embeddings of general metrics into the line , 2005, STOC '05.

[10]  Jeremy Buhler,et al.  Provably sensitive Indexing strategies for biosequence similarity search , 2002, RECOMB '02.

[11]  Russell Impagliazzo,et al.  On the Complexity of k-SAT , 2001, J. Comput. Syst. Sci..

[12]  Chi-Kwong Li,et al.  A Note on Extreme Correlation Matrices , 1994, SIAM J. Matrix Anal. Appl..

[13]  Piotr Indyk,et al.  Embedding ultrametrics into low-dimensional spaces , 2006, SCG '06.

[14]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[15]  Nimrod Megiddo,et al.  Constructing Small Sample Spaces Satisfying Given Constraints , 1994, SIAM J. Discret. Math..

[16]  Jeff Edmonds Embedding into l∞2 Is Easy, Embedding into l∞3 Is NP-Complete , 2008, Discret. Comput. Geom..

[17]  Santosh S. Vempala,et al.  Locality-preserving hashing in multidimensional spaces , 1997, STOC '97.

[18]  H. Joe,et al.  Range of correlation matrices for dependent Bernoulli random variables , 2006 .

[19]  Nimrod Megiddo,et al.  Constructing small sample spaces satisfying given constraints , 1993, SIAM J. Discret. Math..

[20]  M. Piedmonte,et al.  A Method for Generating High-Dimensional Multivariate Binary Variates , 1991 .

[21]  Anirban Dasgupta,et al.  Optimal hashing schemes for entity matching , 2013, WWW.

[22]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[23]  Alexandr Andoni,et al.  Nearest neighbor search : the old, the new, and the impossible , 2009 .

[24]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[25]  Ian Holyer,et al.  The NP-Completeness of Edge-Coloring , 1981, SIAM J. Comput..

[26]  Piotr Indyk,et al.  Approximation algorithms for embedding general metrics into trees , 2007, SODA '07.

[27]  Mihai Badoiu,et al.  Approximation algorithms for low-distortion embeddings into low-dimensional spaces , 2005, SODA '05.

[28]  C. Park,et al.  A Simple Method for Generating Correlated Binary Variates , 1996 .

[29]  László Lovász,et al.  Factoring polynomials with rational coefficients , 1982 .

[30]  Martin Grötschel,et al.  The ellipsoid method and its consequences in combinatorial optimization , 1981, Comb..

[31]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[32]  Anastasios Sidiropoulos Computational metric embeddings , 2008 .

[33]  Ori Sasson,et al.  Property testing of data dimensionality , 2003, SODA '03.