相关论文

Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search

Abstract:Similarity indices for high-dimensional data are very desirable for building content-based search systems for feature-rich data such as audio, images, videos, and other sensor data. Recently, locality sensitive hashing (LSH) and its variations have been proposed as indexing techniques for approximate similarity search. A significant drawback of these approaches is the requirement for a large number of hash tables in order to achieve good search quality. This paper proposes a new indexing scheme called multi-probe LSH that overcomes this drawback. Multi-probe LSH is built on the well-known LSH technique, but it intelligently probes multiple buckets that are likely to contain query results in a hash table. Our method is inspired by and improves upon recent theoretical work on entropy-based LSH designed to reduce the space requirement of the basic LSH method. We have implemented the multi-probe LSH method and evaluated the implementation with two different high-dimensional datasets. Our evaluation shows that the multi-probe LSH method substantially improves upon previously proposed methods in both space and time efficiency. To achieve the same search quality, multi-probe LSH has a similar time-efficiency as the basic LSH method while reducing the number of hash tables by an order of magnitude. In comparison with the entropy-based LSH method, to achieve the same search quality, multi-probe LSH uses less query time and 5 to 8 times fewer number of hash tables.

参考文献

[1]  Richard J. Lipton,et al.  Multidimensional Searching Problems , 1976, SIAM J. Comput..

[2]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[3]  V. Zolotarev One-dimensional stable distributions , 1986 .

[4]  P. Hall ONE‐DIMENSIONAL STABLE DISTRIBUTIONS (Translations of Mathematical Monographs 65) , 1987 .

[5]  Jon Louis Bentley,et al.  K-d trees for semidynamic point sets , 1990, SCG '90.

[6]  S. Meiser,et al.  Point Location in Arrangements of Hyperplanes , 1993, Inf. Comput..

[7]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[8]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[9]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[10]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[11]  Masatoshi Yoshikawa,et al.  The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation , 2000, VLDB.

[12]  George Tzanetakis,et al.  MARSYAS: a framework for audio analysis , 1999, Organised Sound.

[13]  Jeremy Buhler,et al.  Efficient large-scale sequence comparison by locality-sensitive hashing , 2001, Bioinform..

[14]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[15]  I K Fodor,et al.  A Survey of Dimension Reduction Techniques , 2002 .

[16]  J. Nolan Stable Distributions , 2002 .

[17]  Thomas Deselaers,et al.  Features for Image Retrieval , 2003 .

[18]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.

[19]  Robert Krauthgamer,et al.  Navigating nets: simple algorithms for proximity search , 2004, SODA '04.

[20]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[21]  Gunter Saake,et al.  The Active Vertice method: a performant filtering approach to high-dimensional indexing , 2004, Data Knowl. Eng..

[22]  Mayank Bawa,et al.  LSH forest: self-tuning indexes for similarity search , 2005, WWW '05.

[23]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[24]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[25]  Rina Panigrahy,et al.  Entropy based nearest neighbor search in high dimensions , 2005, SODA '06.

[26]  Rajeev Motwani,et al.  Lower bounds on locality sensitive hashing , 2005, SCG '06.

[27]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

引用
Search problems in cryptography : from fingerprinting to lattice sieving
2016
Similarity Search and Applications: 13th International Conference, SISAP 2020, Copenhagen, Denmark, September 30 – October 2, 2020, Proceedings
Lecture Notes in Computer Science
2020
A multicontext-adaptive query creation and search system for large-scale image and video data(審査報告)
2016
On the Merge of k-NN Graph
IEEE Transactions on Big Data
2019
k-Nearest Neighbors algorithm based on weak bit implementation on Enhanced Vote Count circuit
2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP)
2015
A Survey on Different File System Approach
2015
Revisiting kd-tree for Nearest Neighbor Search
KDD
2019
Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors
SODA
2016
Privacy-Preserving Image Retrieval in IoT-Cloud
2016 IEEE Trustcom/BigDataSE/ISPA
2016
Precise Detection of Content Reuse in the Web
CCRV
2019
Deep Multi-loss Hashing Network for Palmprint Retrieval and Recognition
2021 IEEE International Joint Conference on Biometrics (IJCB)
2021
A Progressive Method for Detecting Duplication Entities Based on Bloom Filters
2017 14th Web Information Systems and Applications Conference (WISA)
2017
Web-scale Content Reuse Detection ( extended )
2014
Liquid Silicon-Monona: A Reconfigurable Memory-Oriented Computing Fabric with Scalable Multi-Context Support
ASPLOS
2018
An efficient video copy detection method based on video signature
2009 IEEE International Conference on Automation and Logistics
2009
Pyro: A Spatial-Temporal Big-Data Storage System
USENIX Annual Technical Conference
2015
An Efficient Method for Video Similarity Search with Video Signature
2010 International Conference on Computational and Information Sciences
2010
Large-Scale Machine Learning for Classification and Search
2012
Effective hashing for large-scale multimedia search
SIGMOD'13 PhD Symposium
2013
First-Take-All : Temporal Order-Preserving Hashing for 3 D Action Videos
2018