Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search
Abstract:Similarity indices for high-dimensional data are very desirable for building content-based search systems for feature-rich data such as audio, images, videos, and other sensor data. Recently, locality sensitive hashing (LSH) and its variations have been proposed as indexing techniques for approximate similarity search. A significant drawback of these approaches is the requirement for a large number of hash tables in order to achieve good search quality. This paper proposes a new indexing scheme called multi-probe LSH that overcomes this drawback. Multi-probe LSH is built on the well-known LSH technique, but it intelligently probes multiple buckets that are likely to contain query results in a hash table. Our method is inspired by and improves upon recent theoretical work on entropy-based LSH designed to reduce the space requirement of the basic LSH method. We have implemented the multi-probe LSH method and evaluated the implementation with two different high-dimensional datasets. Our evaluation shows that the multi-probe LSH method substantially improves upon previously proposed methods in both space and time efficiency. To achieve the same search quality, multi-probe LSH has a similar time-efficiency as the basic LSH method while reducing the number of hash tables by an order of magnitude. In comparison with the entropy-based LSH method, to achieve the same search quality, multi-probe LSH uses less query time and 5 to 8 times fewer number of hash tables.
暂无分享,去 创建一个
[1] Richard J. Lipton,et al. Multidimensional Searching Problems , 1976, SIAM J. Comput..
[2] Antonin Guttman,et al. R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.
[3] V. Zolotarev. One-dimensional stable distributions , 1986 .
[4] P. Hall. ONE‐DIMENSIONAL STABLE DISTRIBUTIONS (Translations of Mathematical Monographs 65) , 1987 .
[5] Jon Louis Bentley,et al. K-d trees for semidynamic point sets , 1990, SCG '90.
[6] S. Meiser,et al. Point Location in Arrangements of Hyperplanes , 1993, Inf. Comput..
[7] Shin'ichi Satoh,et al. The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.
[8] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.
[9] Hans-Jörg Schek,et al. A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.
[10] Piotr Indyk,et al. Similarity Search in High Dimensions via Hashing , 1999, VLDB.
[11] Masatoshi Yoshikawa,et al. The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation , 2000, VLDB.
[12] George Tzanetakis,et al. MARSYAS: a framework for audio analysis , 1999, Organised Sound.
[13] Jeremy Buhler,et al. Efficient large-scale sequence comparison by locality-sensitive hashing , 2001, Bioinform..
[14] Ricardo A. Baeza-Yates,et al. Searching in metric spaces , 2001, CSUR.
[15] I K Fodor,et al. A Survey of Dimension Reduction Techniques , 2002 .
[16] J. Nolan. Stable Distributions , 2002 .
[17] Thomas Deselaers,et al. Features for Image Retrieval , 2003 .
[18] Hanan Samet,et al. Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.
[19] Robert Krauthgamer,et al. Navigating nets: simple algorithms for proximity search , 2004, SODA '04.
[20] Nicole Immorlica,et al. Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.
[21] Gunter Saake,et al. The Active Vertice method: a performant filtering approach to high-dimensional indexing , 2004, Data Knowl. Eng..
[22] Mayank Bawa,et al. LSH forest: self-tuning indexes for similarity search , 2005, WWW '05.
[23] Piotr Indyk,et al. Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.
[24] John Langford,et al. Cover trees for nearest neighbor , 2006, ICML.
[25] Rina Panigrahy,et al. Entropy based nearest neighbor search in high dimensions , 2005, SODA '06.
[26] Rajeev Motwani,et al. Lower bounds on locality sensitive hashing , 2005, SCG '06.
[27] Alexandr Andoni,et al. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).