Optimal hash functions for approximate closest pairs on the n-cube

One way to find closest pairs in large datasets is to use hash functions. In recent years locality-sensitive hash functions for various metrics have been given: projecting an n-cube onto k bits is simple hash function that performs well. In this paper we investigate alternatives to projection. For various parameters hash functions given by complete decoding algorithms for codes work better, and asymptotically random codes perform better than projection.

[1]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[2]  André Kündgen Minimum average distance subsets in the hamming cube , 2002, Discret. Math..

[3]  Gérard D. Cohen,et al.  Bounds on distance distributions in codes of known size , 2004, IEEE Transactions on Information Theory.

[4]  L. H. Harper Optimal Assignments of Numbers to Vertices , 1964 .

[5]  Geoffrey Zweig,et al.  The bit vector intersection problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[6]  Noam Nisan,et al.  Neighborhood preserving hashing and approximate queries , 1994, SODA '94.

[7]  John J. Cannon,et al.  The Magma Algebra System I: The User Language , 1997, J. Symb. Comput..

[8]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[9]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[10]  Robert G. Gallager,et al.  Low-density parity-check codes , 1962, IRE Trans. Inf. Theory.

[11]  Alexander Vardy,et al.  Tilings of Binary Spaces , 1996, SIAM J. Discret. Math..

[12]  R. Gallager Information Theory and Reliable Communication , 1968 .

[13]  Eyas El-Qawasmeh,et al.  Reversing the error-correction scheme for a fault-tolerant indexing , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[14]  Torleiv Kløve,et al.  Error detecting codes - general theory and their application in feedback communication systems , 1995, The Kluwer international series in engineering and computer science.

[15]  Rudolf Ahlswede,et al.  Contributions to the geometry of hamming spaces , 1977, Discret. Math..