Gram matrix based representation for image retrieval

In the field of image retrieval, most of image representations based on convolutional neural network (CNN) are first-order forms, i.e., the pooling or encoding methods are adopted on feature maps directly to produce compact image representations, while the high-order representations, such as the dependencies between different channels in the same layer are often neglected. In this paper, a novel image representation and retrieval algorithm based on Gram matrix is proposed. Specifically, based on Gram matrix of convolutional layers, second-order features are firstly constructed by considering the relationships between different channels of feature maps. Afterwards, two weighted schemes, that is, equal channel weighting and sparsity-sensitive channel weighting are presented respectively to aggregate them into the final representation. The extensive experiments on four public image datasets are conducted, and the promising results demonstrate the effectiveness of the proposed algorithm.

[1]  Leon A. Gatys,et al.  Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks , 2015, ArXiv.

[2]  David G. Lowe,et al.  Local feature view clustering for 3D object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[3]  Victor S. Lempitsky,et al.  Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[6]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[7]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Andrew Zisserman,et al.  Triangulation Embedding and Democratic Aggregation for Image Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Jiaying Liu,et al.  Demystifying Neural Style Transfer , 2017, IJCAI.

[12]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[13]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[14]  Atsuto Maki,et al.  Visual Instance Retrieval with Deep Convolutional Networks , 2014, ICLR.

[15]  Atsuto Maki,et al.  From generic to specific deep representations for visual recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[16]  Keiji Yanai,et al.  CNN-based Style Vector for Style Image Retrieval , 2016, ICMR.

[17]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.