A multi-level descriptor using ultra-deep feature for image retrieval

CNN(Convolution Neural Network)-based descriptor generation is extensively studied recently for image retrieval. CNN deep feature trained for image classification is proved to have good transferability for image retrieval task. However, building a highly discriminative descriptor with CNN feature is still an important issue. The feature of the fully-connected layer is usually used and the shallow features of an image are ignored. In this paper, we proposed a simple and effective multi-level descriptor. Firstly, we proposed a multi-level feature fusion (MFF) method to capture low-level color/texture and high-level semantic information simultaneously. MFF replaces the commonly-used “object-level” with “part-level”, and the filters of convolution layer are seen as part detectors, instead of using an object detector method explicitly. The complementary nature of low-level and high-level feature benefits MFF greatly. Secondly, we trained a neural net with class information to further improve the discriminative power of MFF. Our MFF achieves good performance on public image retrieval datasets. Finally, a compressed version is proposed and achieves close performance to the uncompressed version.

[1]  Hongxun Yao,et al.  Exploiting the complementary strengths of multi-layer CNN features for image retrieval , 2017, Neurocomputing.

[2]  Albert Gordo,et al.  Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[3]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[4]  Simon Osindero,et al.  Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[5]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[6]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[7]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[8]  Qi Tian,et al.  Scalable Bag of Selected Deep Features for Visual Instance Retrieval , 2018, MMM.

[9]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[15]  Nicu Sebe,et al.  A Survey on Learning to Hash , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Atsuto Maki,et al.  Factors of Transferability for a Generic ConvNet Representation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[19]  Stéphane Dupont,et al.  Towards Good Practices for Image Retrieval Based on CNN Features , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[20]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[21]  Ondrej Chum,et al.  CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[22]  Shin'ichi Satoh,et al.  Faster R-CNN Features for Instance Search , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[23]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Qi Tian,et al.  Query-adaptive late fusion for image search and person re-identification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Qi Tian,et al.  Packing and Padding: Coupled Multi-index for Accurate Image Retrieval , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Qi Tian,et al.  Feature selection using principal feature analysis , 2007, ACM Multimedia.

[28]  Qi Tian,et al.  Image Classification and Retrieval are ONE , 2015, ICMR.

[29]  Xiangyang Wang,et al.  Content-based image retrieval by integrating color and texture features , 2012, Multimedia Tools and Applications.

[30]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[31]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[32]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[33]  Josef Sivic,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Qi Tian,et al.  Accurate Image Search with Multi-Scale Contextual Evidences , 2016, International Journal of Computer Vision.

[35]  Ming Yang,et al.  Query Specific Fusion for Image Retrieval , 2012, ECCV.

[36]  Larry S. Davis,et al.  Exploiting local features from deep networks for image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[37]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[38]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[39]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[40]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[41]  Ngai-Man Cheung,et al.  Selective Deep Convolutional Features for Image Retrieval , 2017, ACM Multimedia.

[42]  Jianru Xue,et al.  Image Retrieval using Heat Diffusion for Deep Feature Aggregation , 2018 .

[43]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Jiwen Lu,et al.  Deep Hashing for Scalable Image Search , 2017, IEEE Transactions on Image Processing.

[45]  Yonghong Tian,et al.  CNN vs. SIFT for Image Retrieval: Alternative or Complementary? , 2016, ACM Multimedia.

[46]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[47]  Victor S. Lempitsky,et al.  Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[48]  Qi Tian,et al.  Exploiting Hierarchical Activations of Neural Network for Image Retrieval , 2016, ACM Multimedia.