论文信息 - Robust and Efficient Locality Sensitive Hashing for Nearest Neighbor Search in Large Data Sets

Robust and Efficient Locality Sensitive Hashing for Nearest Neighbor Search in Large Data Sets

Locality sensitive hashing (LSH) has been used extensively as a basis for many data retrieval applications. However, previous approache s, such as random projection and multi-probe hashing, may exhibit high query comple xity of up toΘ(n) when the underlying data distribution is highly skewed. Thi s is due to the imbalance in the number of data stored per each bucket, which leads to slow query time in large data sets. In this paper, we introduce a distributio n-free LSH algorithm that addresses this problem by maintaining nearly uniform n u ber of points per bucket. As a consequence, our algorithm allows one to reduce the number of hash tables, and is hence memory-efficient, while achieving high accuracy. Through extensive experiments, we show that our algorithm accurate ly retrieves nearest neighbors faster than other standard LSH algorithms do in la rge data sets, and maintains nearly uniform number of per-bucket points.

Byungkon Kang

[1] J. H. Pollard,et al. ON DISTANCE ESTIMATORS OF DENSITY IN RANDOMLY DISTRIBUTED FORESTS , 1971 .

[2] Ieee Xplore,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[4] Piotr Indyk,et al. Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[5] Daniel Shawcross Wilkerson,et al. Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[6] Wei Tsang Ooi,et al. Hierarchical, non-uniform locality sensitive hashing and its application to video identification , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[7] Nicole Immorlica,et al. Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[8] Zhe Wang,et al. Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[9] Kristen Grauman,et al. Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10] Laurent Amsaleg,et al. Locality sensitive hashing: A comparison of hash function types and querying mechanisms , 2010, Pattern Recognit. Lett..

[11] Ping Li,et al. Theory and applications of b-bit minwise hashing , 2011, Commun. ACM.