Margin-based two-stage supervised hashing for image retrieval

Similarity-preserving hashing is a widely used method for nearest neighbor search in large-scale image retrieval. Recently, supervised hashing methods are appealing in that they learn compact hash codes with fewer bits by incorporating supervised information. In this paper, we propose a new two-stage supervised hashing methods which decomposes the hash learning process into a stage of learning approximate hash codes followed by a stage of learning hash functions. In the first stage, we propose a margin-based objective to find approximate hash codes such that a pair of hash codes associating to a pair of similar (dissimilar) images has sufficiently small (large) Hamming distance. This objective results in a challenging optimization problem. We develop a coordinate descent algorithm to efficiently solve this optimization problem. In the second stage, we use convolutional neural networks to learn hash functions. We conduct extensive evaluations on several benchmark datasets with different kinds of images. The results show that the proposed margin-based hashing method has substantial improvement upon the state-of-the-art supervised or unsupervised hashing methods.

[1]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[2]  Hanjiang Lai,et al.  Supervised Hashing for Image Retrieval via Image Representation Learning , 2014, AAAI.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Shih-Fu Chang,et al.  Semi-supervised hashing for scalable image retrieval , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[6]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[7]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[8]  TorralbaAntonio,et al.  Modeling the Shape of the Scene , 2001 .

[9]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[10]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[11]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[12]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[13]  Guosheng Lin,et al.  Learning Hash Functions Using Column Generation , 2013, ICML.

[14]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[15]  Trevor Darrell,et al.  Learning to Hash with Binary Reconstructive Embeddings , 2009, NIPS.

[16]  David J. Fleet,et al.  Minimal Loss Hashing for Compact Binary Codes , 2011, ICML.

[17]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Geoffrey E. Hinton,et al.  Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure , 2007, AISTATS.

[19]  David Suter,et al.  A General Two-Step Approach to Learning-Based Hashing , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.