SHOE: Sibling Hashing with Output Embeddings

We present a supervised binary encoding scheme for image retrieval that learns projections by taking into account similarity between classes obtained from output embeddings. Our motivation is that binary hash codes learned in this way improve the visual quality of retrieval results by ranking related (or ``sibling'') class images before unrelated class images. We employ a sequential greedy optimization that learns relationship aware projections by minimizing the difference between inner products of binary codes and output embedding vectors. We develop a joint optimization framework to learn projections which improve the accuracy of supervised hashing over the current state of the art with respect to standard and sibling evaluation metrics. We further obtain discriminative features learned from correlations of kernelized input CNN features and output embeddings, which significantly boosts performance. Experiments are performed on three datasets: CUB-2011, SUN-Attribute and ImageNet ILSVRC 2010, where we show significant improvement in sibling performance metrics over state-of-the-art supervised hashing techniques, while maintaining performance with respect to standard metrics.

[1]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[3]  Chen Xu,et al.  The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding , 2014, International Journal of Computer Vision.

[4]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[5]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[6]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[7]  David Suter,et al.  Fast Supervised Hashing with Decision Trees for High-Dimensional Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  John Langford,et al.  Multi-Label Prediction via Compressed Sensing , 2009, NIPS.

[10]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Kilian Q. Weinberger,et al.  Large Margin Taxonomy Embedding for Document Categorization , 2008, NIPS.