Locality-sensitive Hashing without False Negatives

We consider a new construction of locality-sensitive hash functions for Hamming space that is covering in the sense that is it guaranteed to produce a collision for every pair of vectors within a given radius r. The construction is efficient in the sense that the expected number of hash collisions between vectors at distance cr, for a given c > 1, comes close to that of the best possible data independent LSH without the covering guarantee, namely, the seminal LSH construction of Indyk and Motwani (FOCS '98). The efficiency of the new construction essentially matches their bound if cr = log(n)/k, where n is the number of points in the data set and k ∈ N, and differs from it by at most a factor ln(4) in the exponent for general values of cr. As a consequence, LSH-based similarity search in Hamming space can avoid the problem of false negatives at little or no cost in efficiency.

[1]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[2]  Richard Cole,et al.  Dictionary matching and indexing with errors and don't cares , 2004, STOC '04.

[3]  Torben Hagerup,et al.  Sorting and Searching on the Word RAM , 1998, STACS.

[4]  Nikolai N. Kuzjurin,et al.  On the difference between asymptotically good packings and coverings , 1995, Eur. J. Comb..

[5]  Alexandr Andoni,et al.  Optimal Data-Dependent Hashing for Approximate Near Neighbors , 2015, STOC.

[6]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[7]  Ryan Williams,et al.  A new algorithm for optimal 2-constraint satisfaction and its implications , 2005, Theor. Comput. Sci..

[8]  Rina Panigrahy,et al.  Entropy based nearest neighbor search in high dimensions , 2005, SODA '06.

[9]  Nikolai N. Kuzjurin Explicit Constructions Of Rödl's Asymptotically Good Packings And Coverings , 2000, Comb. Probab. Comput..

[10]  Alexandr Andoni,et al.  Beyond Locality-Sensitive Hashing , 2013, SODA.

[11]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[12]  David J. Fleet,et al.  Fast search in Hamming space with multi-index hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  G. Kucherov,et al.  Multiseed lossless filtration , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[15]  Raghav Kaushik,et al.  Efficient exact set-similarity joins , 2006, VLDB.

[16]  F. Frances Yao,et al.  Multi-index hashing for information retrieval , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[17]  Ryan Williams,et al.  Probabilistic Polynomials and Hamming Nearest Neighbors , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[18]  Mikhail Kapralov Smooth Tradeoffs between Insert and Query Complexity in Nearest Neighbor Search , 2015, PODS.

[19]  Yi Wu,et al.  Optimal Lower Bounds for Locality-Sensitive Hashing (Except When q is Tiny) , 2014, TOCT.

[20]  Marvin Minsky,et al.  Perceptrons: expanded edition , 1988 .