Enlarging the discriminability of bag-of-words representations with deep convolutional features

In this work, we propose an extension of established image retrieval models which are based on the bag-of-words representation, i.e. on models which quantize local features such as SIFT to leverage an inverted file indexing scheme for speedup. Since the quantization of local features impairs their discriminability, the ability to retrieve those database images which show the same object or scene to a given query image is decreasing with the growing number of images in the database. We address this issue by extending a quantized local feature with information from its local spatial neighborhood incorporating a representation based on pooling features from deep convolutional neural network layer outputs. Using four public datasets, we evaluate both the discriminability of the representation and its overall performance in a large-scale image retrieval setup.

[1]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[2]  Jiri Matas,et al.  Efficient representation of local geometry for large scale object retrieval , 2009, CVPR.

[3]  Victor S. Lempitsky,et al.  Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Jiri Matas,et al.  Total recall II: Query expansion revisited , 2011, CVPR 2011.

[5]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Hervé Jégou,et al.  Visual query expansion with or without geometry: Refining local descriptors by feature aggregation , 2014, Pattern Recognit..

[7]  Bart Thomee,et al.  New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative , 2010, MIR '10.

[8]  David Stutz,et al.  Neural Codes for Image Retrieval , 2015 .

[9]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[11]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[12]  Simon Osindero,et al.  Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[13]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Qi Tian,et al.  Contextual Hashing for Large-Scale Image Search , 2014, IEEE Transactions on Image Processing.

[17]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[18]  Shiliang Zhang,et al.  Multi-order visual phrase for scalable image search , 2013, ICIMCS '13.

[19]  Shengjin Wang,et al.  Visual Phraselet: Refining Spatial Constraints for Large Scale Image Search , 2013, IEEE Signal Processing Letters.

[20]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Atsuto Maki,et al.  A Baseline for Visual Instance Retrieval with Deep Convolutional Networks , 2014, ICLR 2015.

[22]  Fahad Shahbaz Khan,et al.  Color attributes for object detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Ying Wu,et al.  Object retrieval and localization with spatially-constrained similarity measure and k-NN re-ranking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[26]  Qi Tian,et al.  Packing and Padding: Coupled Multi-index for Accurate Image Retrieval , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Markus Müller,et al.  Filtering local features for logo detection and localization in sports videos , 2015, 2015 IEEE International Conference on Signal and Image Processing Applications (ICSIPA).

[29]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.