Robust and Efficient Locality Sensitive Hashing for Nearest Neighbor Search in Large Data Sets

Locality sensitive hashing (LSH) has been used extensively as a basis for many data retrieval applications. However, previous approache s, such as random projection and multi-probe hashing, may exhibit high query comple xity of up toΘ(n) when the underlying data distribution is highly skewed. Thi s is due to the imbalance in the number of data stored per each bucket, which leads to slow query time in large data sets. In this paper, we introduce a distributio n-free LSH algorithm that addresses this problem by maintaining nearly uniform n u ber of points per bucket. As a consequence, our algorithm allows one to reduce the number of hash tables, and is hence memory-efficient, while achieving high accuracy. Through extensive experiments, we show that our algorithm accurate ly retrieves nearest neighbors faster than other standard LSH algorithms do in la rge data sets, and maintains nearly uniform number of per-bucket points.