Channel-Wise Part-Based Weighting Aggregation for Image Retrieval

The part-based weighting aggregation (PWA) method is a simple and straightforward way for creating powerful image representations in image retrieval. In this paper we analyze the burstiness among the responses of the aggregated channels in the PWA method, and propose a channel-wise weights utilizing the sum-pooling aggregation of the deep features’ responses to regulate the burstiness among the feature maps. Firstly, we apply the channel weights to the aggregation stage of PWA. The regulation of the burstiness among the aggregated channels boosts the PWA performance. Secondly, we use the channel weights to the part selection stage, namely, selecting the part detectors based on the channel weighted feature maps to obtain more discriminate parts to further improve the PWA performance. Comprehensive experiments on four standard datasets have demonstrated the effects of the proposed method.

[1]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[5]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[6]  Ngai-Man Cheung,et al.  Selective Deep Convolutional Features for Image Retrieval , 2017, ACM Multimedia.

[7]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[9]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[10]  Simon Osindero,et al.  Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[11]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[12]  David Stutz,et al.  Neural Codes for Image Retrieval , 2015 .

[13]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Qi Tian,et al.  SIFT Meets CNN: A Decade Survey of Instance Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Victor S. Lempitsky,et al.  Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[19]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[20]  Noel E. O'Connor,et al.  Bags of Local Convolutional Features for Scalable Instance Search , 2016, ICMR.

[21]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[22]  Larry S. Davis,et al.  Exploiting local features from deep networks for image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[23]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[26]  Chunheng Wang,et al.  Unsupervised Part-Based Weighting Aggregation of Deep Convolutional Features for Image Retrieval , 2017, AAAI.