论文信息 - Lower bounds on locality sensitive hashing

Lower bounds on locality sensitive hashing

Given a metric space (X,dX), c≥1, r>0, and p,q ≡ [0,1], a distribution over mappings H : X → N is called a (r,cr,p,q)-sensitive hash family if any two points in X at distance at most r are mapped by H to the same value with probability at least p, and any two points at distance greater than cr are mapped by H to the same value with probability at most q. This notion was introduced by Indyk and Motwani in 1998 as the basis for an efficient approximate nearest neighbor search algorithm, and has since been used extensively for this purpose. The performance of these algorithms is governed by the parameter ⊇=log(1/p)/log(1/q), and constructing hash families with small ⊇ automatically yields improved nearest neighbor algorithms. Here we show that for X=l1 it is impossible to achieve ⊇ ≤ 1/2c. This almost matches the construction of Indyk and Motwani which achieves ⊇ ≤ 1/c.

[1] A. Bonami. Étude des coefficients de Fourier des fonctions de $L^p(G)$ , 1970 .

[2] W. Beckner. Inequalities in Fourier analysis , 1975 .

[3] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[4] Sariel Har-Peled. A replacement for Voronoi diagrams of near linear size , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[5] Nicole Immorlica,et al. Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[6] Rina Panigrahy,et al. Entropy based nearest neighbor search in high dimensions , 2005, SODA '06.