论文信息 - Towards Optimal CNN Descriptors for Large-Scale Image Retrieval

Towards Optimal CNN Descriptors for Large-Scale Image Retrieval

Instance-level image retrieval is a long-standing and challenging problem in multimedia. Recently, fine-tuning Convolutional Neural Networks (CNNs) has become a promising direction, and a number of successful strategies based on global CNN descriptors have been proposed. However, it is difficult to make direct comparisons and draw conclusions due to different settings and/or datasets. The goal of this paper is two-fold. Firstly, we present a unified implementation of modern global-CNN-based retrieval systems, break such a system into six major components, and investigate each part individually as well as globally when considering different configurations. We conduct a systematic series of experiments on a component-by-component basis and find an optimal solution in designing such a system. Secondly, we introduce a novel joint loss function with learnable parameter for fine-tuning for retrieval tasks and show, with extensive experiments, significant improvement over previous works. On the new and challenging large-scale Google-Landmarks-Dataset, we set a baseline for future research and comparisons, while on traditional retrieval benchmarks such as Oxford5k and Paris6k, as well as their recent revised versions ROxford5k and RParis6k, we achieve state-of-the-art performance under all three (Easy, Medium, and Hard) evaluation protocals by a large margin compared to competing methods.

Yu-Gang Jiang | Yinzheng Gu | Chuanpeng Li

[1] Yannis Avrithis,et al. Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] David Stutz,et al. Neural Codes for Image Retrieval , 2015 .

[3] Michael Isard,et al. Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Michael Isard,et al. Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Yannis Avrithis,et al. Hybrid Diffusion: Spectral-Temporal Graph Filtering for Manifold Ranking , 2018, ACCV.

[6] Hervé Jégou,et al. Visual query expansion with or without geometry: Refining local descriptors by feature aggregation , 2014, Pattern Recognit..

[7] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[8] Giorgos Tolias,et al. Fine-Tuning CNN Image Retrieval with No Human Annotation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Yannis Avrithis,et al. Fast Spectral Ranking for Similarity Search , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10] Ronan Sicre,et al. Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[11] Panu Turcot,et al. Better matching with fewer features: The selection of useful features in large database recognition problems , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[12] Bohyung Han,et al. Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[13] Albert Gordo,et al. Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Jiri Matas,et al. Efficient representation of local geometry for large scale object retrieval , 2009, CVPR.

[16] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[17] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Yannis Avrithis,et al. Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[22] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[23] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[24] Ondrej Chum,et al. CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[25] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26] Yinzheng Gu,et al. Attention-aware Generalized Mean Pooling for Image Retrieval , 2018, ArXiv.

[27] Cordelia Schmid,et al. Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[29] Michael Isard,et al. Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[30] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[32] Victor S. Lempitsky,et al. Aggregating Deep Convolutional Features for Image Retrieval , 2015, ArXiv.

[33] Andrew Zisserman,et al. Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[35] Florent Perronnin,et al. Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36] Albert Gordo,et al. End-to-End Learning of Deep Visual Representations for Image Retrieval , 2016, International Journal of Computer Vision.

[37] Svetlana Lazebnik,et al. Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[38] Simon Osindero,et al. Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[39] Hervé Jégou,et al. Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening , 2012, ECCV.