CLSH: Cluster-based Locality-Sensitive Hashing

Locality-sensitive hashing (LSH) usually consumes large memory in similarity search, which limits its scalability for large scale applications. In this paper, we propose a novel cluster-based locality-sensitive hashing (CLSH) approach, which extends the conventional LSH framework and aims at indexing and searching large scale high-dimensional datasets. We first utilize a clustering algorithm to partition the raw feature dataset into clusters, and map these clusters to a distributed cluster. Then, LSH method is applied to construct the index for each cluster, and we present two criteria to choose the cluster(s) for further detailed search in order to improve the search quality. This proposed framework comes with following properties. Firstly, CLSH can cope with large scale feature dataset. Secondly, the generated clusters can guide the feature dataset automatical mappings to a distributed cluster. After that, the search time can be reduced a lot by searching on multiple computing nodes. Experiments show that the proposed approach outperforms the existing approaches in terms of efficiency and scalability.

[1]  Olivier Buisson,et al.  Random maximum margin hashing , 2011, CVPR 2011.

[2]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[4]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[5]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[6]  Shuicheng Yan,et al.  Efficient large-scale image annotation by probabilistic collaborative multi-label propagation , 2010, ACM Multimedia.

[7]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[8]  Yuting Su,et al.  Rank canonical correlation analysis and its application in visual search reranking , 2013, Signal Process..

[9]  Shuicheng Yan,et al.  Weakly-supervised hashing in kernel space , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[11]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[12]  Xuelong Li,et al.  Ranking Graph Embedding for Learning to Rerank , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[13]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[14]  Kristen Grauman,et al.  Kernelized Locality-Sensitive Hashing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Shih-Fu Chang,et al.  Semi-supervised hashing for scalable image retrieval , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[17]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[18]  Cordelia Schmid,et al.  Query adaptative locality sensitive hashing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.