Co-weighting semantic convolutional features for object retrieval

Abstract Deep feature aggregation, which refers to aggregating a set of local convolutional features into a global image-level vector, has attracted increasing attention in object instance retrieval. In this manuscript, we propose an unsupervised framework that aggregates feature maps by an adaptive selection and two weighting strategies. Particularly, the selection process finds the foreground contour by explaining the semantic structure implicated in the feature maps, while two weighting process including an adaptive Gaussian filter that highlights semantic features and an element-value sensitive channel vector that activates feature channels corresponding to sparse yet distinctive image patterns. Experimental results on benchmark image retrieval datasets validate that the selection and two weighting schemes are complementary in improving the discriminative power of image vectors. With the same experimental settings, the proposed approach outperforms state-of-the-art aggregation approaches by a considerable margin.

[1]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[2]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  Albert Gordo,et al.  End-to-End Learning of Deep Visual Representations for Image Retrieval , 2016, International Journal of Computer Vision.

[4]  Simon Osindero,et al.  Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[5]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Jianyu Yang,et al.  Metric learning based object recognition and retrieval , 2016, Neurocomputing.

[7]  Albert Gordo,et al.  Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[8]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9]  Xueming Qian,et al.  Adaptive Co-Weighting Deep Convolutional Features for Object Retrieval , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[10]  Jianru Xue,et al.  Building discriminative CNN image representations for object retrieval using the replicator equation , 2018, Pattern Recognit..

[11]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Chunheng Wang,et al.  Unsupervised Semantic-Based Aggregation of Deep Convolutional Features , 2018, IEEE Transactions on Image Processing.

[14]  Henning Müller,et al.  Large‐scale retrieval for medical image analytics: A comprehensive review , 2018, Medical Image Anal..

[15]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[16]  Jun Ma,et al.  NeuroStylist: Neural Compatibility Modeling for Clothing Matching , 2017, ACM Multimedia.

[17]  Meng Liu,et al.  Online Data Organizer: Micro-Video Categorization by Structure-Guided Multimodal Dictionary Learning , 2019, IEEE Transactions on Image Processing.

[18]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Han Jiang,et al.  Deep residual networks for hyperspectral image classification , 2017, 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[21]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[22]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[23]  Ondrej Chum,et al.  CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[24]  Yuan Zhang,et al.  SIFT Matching with CNN Evidences for Particular Object Retrieval , 2017, Neurocomputing.

[25]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[26]  Wei Liu,et al.  Neural Compatibility Modeling with Attentive Knowledge Distillation , 2018, SIGIR.

[27]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[28]  Ngai-Man Cheung,et al.  Selective Deep Convolutional Features for Image Retrieval , 2017, ACM Multimedia.

[29]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Victor S. Lempitsky,et al.  Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Jianru Xue,et al.  Unifying Sum and Weighted Aggregations for Efficient Yet Effective Image Representation Computation , 2019, IEEE Transactions on Image Processing.