论文信息 - A multi-level descriptor using ultra-deep feature for image retrieval

A multi-level descriptor using ultra-deep feature for image retrieval

CNN(Convolution Neural Network)-based descriptor generation is extensively studied recently for image retrieval. CNN deep feature trained for image classification is proved to have good transferability for image retrieval task. However, building a highly discriminative descriptor with CNN feature is still an important issue. The feature of the fully-connected layer is usually used and the shallow features of an image are ignored. In this paper, we proposed a simple and effective multi-level descriptor. Firstly, we proposed a multi-level feature fusion (MFF) method to capture low-level color/texture and high-level semantic information simultaneously. MFF replaces the commonly-used “object-level” with “part-level”, and the filters of convolution layer are seen as part detectors, instead of using an object detector method explicitly. The complementary nature of low-level and high-level feature benefits MFF greatly. Secondly, we trained a neural net with class information to further improve the discriminative power of MFF. Our MFF achieves good performance on public image retrieval datasets. Finally, a compressed version is proposed and achieves close performance to the uncompressed version.

Junqing Yu | Zebin Wu | Zebin Wu | Junqing Yu

[1] Hongxun Yao,et al. Exploiting the complementary strengths of multi-layer CNN features for image retrieval , 2017, Neurocomputing.

[2] Albert Gordo,et al. Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[3] Svetlana Lazebnik,et al. Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[4] Simon Osindero,et al. Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[5] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[6] Christopher Hunt,et al. Notes on the OpenSURF Library , 2009 .

[7] Ronan Sicre,et al. Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[8] Qi Tian,et al. Scalable Bag of Selected Deep Features for Visual Instance Retrieval , 2018, MMM.

[9] C. Schmid,et al. On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] David Nistér,et al. Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14] Luc Van Gool,et al. SURF: Speeded Up Robust Features , 2006, ECCV.

[15] Nicu Sebe,et al. A Survey on Learning to Hash , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Atsuto Maki,et al. Factors of Transferability for a Generic ConvNet Representation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18] Richard Szeliski,et al. Building Rome in a day , 2009, ICCV.

[19] Stéphane Dupont,et al. Towards Good Practices for Image Retrieval Based on CNN Features , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[20] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[21] Ondrej Chum,et al. CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[22] Shin'ichi Satoh,et al. Faster R-CNN Features for Instance Search , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[23] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Qi Tian,et al. Query-adaptive late fusion for image search and person re-identification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Qi Tian,et al. Packing and Padding: Coupled Multi-index for Accurate Image Retrieval , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Michael Isard,et al. Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27] Qi Tian,et al. Feature selection using principal feature analysis , 2007, ACM Multimedia.

[28] Qi Tian,et al. Image Classification and Retrieval are ONE , 2015, ICMR.

[29] Xiangyang Wang,et al. Content-based image retrieval by integrating color and texture features , 2012, Multimedia Tools and Applications.

[30] Cordelia Schmid,et al. Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[31] Victor S. Lempitsky,et al. Neural Codes for Image Retrieval , 2014, ECCV.

[32] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[33] Josef Sivic,et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Qi Tian,et al. Accurate Image Search with Multi-Scale Contextual Evidences , 2016, International Journal of Computer Vision.

[35] Ming Yang,et al. Query Specific Fusion for Image Retrieval , 2012, ECCV.

[36] Larry S. Davis,et al. Exploiting local features from deep networks for image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[37] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[38] Florent Perronnin,et al. Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[39] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[40] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[41] Ngai-Man Cheung,et al. Selective Deep Convolutional Features for Image Retrieval , 2017, ACM Multimedia.

[42] Jianru Xue,et al. Image Retrieval using Heat Diffusion for Deep Feature Aggregation , 2018 .

[43] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[44] Jiwen Lu,et al. Deep Hashing for Scalable Image Search , 2017, IEEE Transactions on Image Processing.

[45] Yonghong Tian,et al. CNN vs. SIFT for Image Retrieval: Alternative or Complementary? , 2016, ACM Multimedia.

[46] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[47] Victor S. Lempitsky,et al. Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[48] Qi Tian,et al. Exploiting Hierarchical Activations of Neural Network for Image Retrieval , 2016, ACM Multimedia.