论文信息 - Where to Focus: Query Adaptive Matching for Instance Retrieval Using Convolutional Feature Maps

Where to Focus: Query Adaptive Matching for Instance Retrieval Using Convolutional Feature Maps

Instance retrieval requires one to search for images that contain a particular object within a large corpus. Recent studies show that using image features generated by pooling convolutional layer feature maps (CFMs) of a pretrained convolutional neural network (CNN) leads to promising performance for this task. However, due to the global pooling strategy adopted in those works, the generated image feature is less robust to image clutter and tends to be contaminated by the irrelevant image patterns. In this article, we alleviate this drawback by proposing a novel reranking algorithm using CFMs to refine the retrieval result obtained by existing methods. Our key idea, called query adaptive matching (QAM), is to first represent the CFMs of each image by a set of base regions which can be freely combined into larger regions-of-interest. Then the similarity between the query and a candidate image is measured by the best similarity score that can be attained by comparing the query feature and the feature pooled from a combined region. We show that the above procedure can be cast as an optimization problem and it can be solved efficiently with an off-the-shelf solver. Besides this general framework, we also propose two practical ways to create the base regions. One is based on the property of the CFM and the other one is based on a multi-scale spatial pyramid scheme. Through extensive experiments, we show that our reranking approaches bring substantial performance improvement and by applying them we can outperform the state of the art on several instance retrieval benchmarks.

[1] Cordelia Schmid,et al. Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[2] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[3] Ronan Sicre,et al. Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[4] Jiri Matas,et al. Total recall II: Query expansion revisited , 2011, CVPR 2011.

[5] Victor S. Lempitsky,et al. Aggregating Deep Convolutional Features for Image Retrieval , 2015, ArXiv.

[6] Andrew Zisserman,et al. Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Ulrike von Luxburg,et al. A tutorial on spectral clustering , 2007, Stat. Comput..

[8] Jiri Matas,et al. Learning Vocabularies over a Fine Quantization , 2013, International Journal of Computer Vision.

[9] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10] Victor S. Lempitsky,et al. Neural Codes for Image Retrieval , 2014, ECCV.

[11] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Atsuto Maki,et al. A Baseline for Visual Instance Retrieval with Deep Convolutional Networks , 2014, ICLR 2015.

[13] Cordelia Schmid,et al. Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[14] Anton van den Hengel,et al. The treasure beneath convolutional layers: Cross-convolutional-layer pooling for image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16] Florent Perronnin,et al. Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[18] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[19] Trevor Darrell,et al. Do Convnets Learn Correspondence? , 2014, NIPS.

[20] Andrew Zisserman,et al. All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[22] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[23] Arnold W. M. Smeulders,et al. Locality in Generic Instance Search from One Example , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Cordelia Schmid,et al. Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Andrew Zisserman,et al. Triangulation Embedding and Democratic Aggregation for Image Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Simon Osindero,et al. Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[27] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[28] Andrew Zisserman,et al. Smooth object retrieval using a bag of boundaries , 2011, 2011 International Conference on Computer Vision.

[29] Michael Isard,et al. Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Michael Isard,et al. Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31] Ji Wan,et al. Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[32] Forrest N. Iandola,et al. DenseNet: Implementing Efficient ConvNet Descriptor Pyramids , 2014, ArXiv.

[33] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Shuang Wang,et al. INSTRE: A New Benchmark for Instance-Level Object Retrieval and Recognition , 2015, ACM Trans. Multim. Comput. Commun. Appl..

[35] Steven C. H. Hoi,et al. Fast Object Retrieval Using Direct Spatial Matching , 2015, IEEE Transactions on Multimedia.