Semi-supervised hashing for scalable image retrieval

Large scale image search has recently attracted considerable attention due to easy availability of huge amounts of data. Several hashing methods have been proposed to allow approximate but highly efficient search. Unsupervised hashing methods show good performance with metric distances but, in image search, semantic similarity is usually given in terms of labeled pairs of images. There exist supervised hashing methods that can handle such semantic similarity but they are prone to overfitting when labeled data is small or noisy. Moreover, these methods are usually very slow to train. In this work, we propose a semi-supervised hashing method that is formulated as minimizing empirical error on the labeled data while maximizing variance and independence of hash bits over the labeled and unlabeled data. The proposed method can handle both metric as well as semantic similarity. The experimental results on two large datasets (up to one million samples) demonstrate its superior performance over state-of-the-art supervised and unsupervised methods.

[1]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[2]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[3]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[4]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[6]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[7]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing) , 2006 .

[8]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[9]  Richard I. Hartley,et al.  Optimised KD-trees for fast image descriptor matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Prateek Jain,et al.  Fast image search for learned metrics , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[12]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[13]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Shree K. Nayar,et al.  What Is a Good Nearest Neighbors Algorithm for Finding Similar Patches in Images? , 2008, ECCV.

[15]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[16]  Trevor Darrell,et al.  Learning to Hash with Binary Reconstructive Embeddings , 2009, NIPS.

[17]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.