Regional Attention Based Deep Feature for Image Retrieval

Many approaches using Convolutional Neural Network (CNN) for efficient image retrieval have concentrated on feature aggregation rather than feature embedding over recent years, since convolutional features have been found to be reasonably discriminative. Nonetheless, we found that a well-known region-based feature aggregation method, RMAC, for image retrieval is suffered from the background clutter and varying importance of regions. In this work, we tackle these problems with a simple and effective, contextaware regional attention network that weights an attentive score of a region considering global attentiveness. We conduct various experiments on well-known retrieval datasets, and confirm that our method does not only improve the R-MAC baseline significantly, but also present new state-of-the-art results in the category of “pre-trained single-pass”. Furthermore, we show that our method shows higher accuracy improvement combined over prior methods, when combined with the query expansion method. These results are attributed by our novel regional-attention network integrated with R-MAC.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Albert Gordo,et al.  End-to-End Learning of Deep Visual Representations for Image Retrieval , 2016, International Journal of Computer Vision.

[3]  Simon Osindero,et al.  Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[4]  Hervé Jégou,et al.  Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening , 2012, ECCV.

[5]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6]  Yizhou Yu,et al.  Visual saliency based on multiscale deep features , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Noel E. O'Connor,et al.  Bags of Local Convolutional Features for Scalable Instance Search , 2016, ICMR.

[8]  Victor S. Lempitsky,et al.  Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Xiaogang Wang,et al.  Saliency detection by multi-context deep learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Jaeyoon Kim,et al.  Fast 2D Complex Gabor Filter With Kernel Decomposition , 2017, IEEE Transactions on Image Processing.

[14]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[15]  Ngai-Man Cheung,et al.  Selective Deep Convolutional Features for Image Retrieval , 2017, ACM Multimedia.

[16]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[17]  Xavier Giró-i-Nieto,et al.  Class-Weighted Convolutional Features for Visual Instance Search , 2017, BMVC.

[18]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[19]  Ondrej Chum,et al.  CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[20]  Qi Tian,et al.  SIFT Meets CNN: A Decade Survey of Instance Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Andrew Zisserman,et al.  Triangulation Embedding and Democratic Aggregation for Image Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Bohyung Han,et al.  Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Albert Gordo,et al.  Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[24]  Stéphane Dupont,et al.  Towards Good Practices for Image Retrieval Based on CNN Features , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[25]  L. Heilmann Proceedings, Part II , 1943, Ecology.

[26]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[27]  Yoshua Bengio,et al.  Série Scientifique Scientific Series Incorporating Second-order Functional Knowledge for Better Option Pricing Incorporating Second-order Functional Knowledge for Better Option Pricing , 2022 .

[28]  Xiu-Shen Wei,et al.  Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval , 2016, IEEE Transactions on Image Processing.

[29]  David Stutz,et al.  Neural Codes for Image Retrieval , 2015 .

[30]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Junwei Han,et al.  DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).