Distribution-Aware Locality Sensitive Hashing

Locality Sensitive Hashing (LSH) has been popularly used in content-based search systems. There exist two main categories of LSH methods: one is to index the original data in an effective way to accelerate search process; the other one is to embed the high-dimensional data into hamming space and perform bit-wise operations to search similar objects. In this paper, we propose a new LSH scheme, called Distribution-Aware LSH (DALSH), to address the problem of lacking adaptation to real data, which is the intrinsic limitation of most LSH methods belong to the former category. In DALSH, a given dataset is embedded into a low-dimensional space with projection vectors learned from data, followed by deriving hash functions from the distribution of the dimension-reduced data. We also present a multi-probe strategy to improve the query performance. Experimental comparisons with the state-of-the-art LSH methods on two high-dimensional datasets demonstrate the efficacy of DALSH.

[1]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[2]  Olivier Buisson,et al.  A posteriori multi-probe locality sensitive hashing , 2008, ACM Multimedia.

[3]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision , 2008, IEEE Trans. Neural Networks.

[4]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[5]  Richard I. Hartley,et al.  Optimised KD-trees for fast image descriptor matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[7]  Mayank Bawa,et al.  LSH forest: self-tuning indexes for similarity search , 2005, WWW '05.

[8]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[9]  F. Gianfelici,et al.  Nearest-Neighbor Methods in Learning and Vision (Shakhnarovich, G. et al., Eds.; 2006) [Book review] , 2008 .

[10]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[12]  Cordelia Schmid,et al.  Query adaptative locality sensitive hashing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Laurent Amsaleg,et al.  NV-Tree: An Efficient Disk-Based Index for Approximate Search in Very Large High-Dimensional Collections , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[15]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[16]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[18]  Laurent Amsaleg,et al.  Locality sensitive hashing: A comparison of hash function types and querying mechanisms , 2010, Pattern Recognit. Lett..

[19]  Svetlana Lazebnik,et al.  Locality-sensitive binary codes from shift-invariant kernels , 2009, NIPS.

[20]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[21]  Shih-Fu Chang,et al.  Sequential Projection Learning for Hashing with Compact Codes , 2010, ICML.

[22]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[23]  Panos Kalnis,et al.  Efficient and accurate nearest neighbor and closest pair search in high-dimensional space , 2010, TODS.

[24]  Jon Louis Bentley,et al.  K-d trees for semidynamic point sets , 1990, SCG '90.

[25]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[26]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[27]  Kristen Grauman,et al.  Kernelized Locality-Sensitive Hashing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Xian-Sheng Hua,et al.  Towards a Relevant and Diverse Search of Social Images , 2010, IEEE Transactions on Multimedia.

[29]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .