Deep Image Retrieval: Indicator and Gram Matrix Weighting for Aggregated Convolutional Features

Convolutional Neural Network (CNN) has been proven to be an effective feature extractor for multiple computer vision tasks such as image classification and object detection etc. However, image retrieval in realistic scenarios, usually faces large-scale unlabeled datasets, thus the learning of a good model is often infeasible. In this paper, we propose a novel and interpretable image representation via spatial-channel weighting for aggregated deep convolutional features. Specifically, we first determine discriminative regions of an image by computing the Indicator matrix, and then, the distinctive features are extracted from salient areas by calculating the Gram matrix, in which high-order features are learnt. Finally, a compact image representation is generated by fusing spatial saliency and channel sensitivity of CNN features. The experimental results on several benchmark datasets, i.e., Oxford buildings, Paris buildings and Holidays, indicate that the proposed approach outperforms state-of-the-art methods based on pre-trained deep networks.

[1]  Leon A. Gatys,et al.  Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks , 2015, ArXiv.

[2]  Yao Li,et al.  Deep Descriptor Transforming for Image Co-Localization , 2017, IJCAI.

[3]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[4]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[5]  Noel E. O'Connor,et al.  Bags of Local Convolutional Features for Scalable Instance Search , 2016, ICMR.

[6]  David G. Lowe,et al.  Local feature view clustering for 3D object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[7]  Ming Yang,et al.  Infomax principle based pooling of deep convolutional activations for image retrieval , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[8]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[9]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[10]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Ondrej Chum,et al.  CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  Shin'ichi Satoh,et al.  Faster R-CNN Features for Instance Search , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[16]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[17]  Victor S. Lempitsky,et al.  Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[19]  Chunheng Wang,et al.  Unsupervised Part-Based Weighting Aggregation of Deep Convolutional Features for Image Retrieval , 2017, AAAI.

[20]  Albert Gordo,et al.  Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[21]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[22]  Simon Osindero,et al.  Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[23]  Xavier Giró-i-Nieto,et al.  Class-Weighted Convolutional Features for Visual Instance Search , 2017, BMVC.

[24]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.