Towards Extending Bag-of-Words-Models Using Context Features for an 2D Inverted Index

This paper addresses the image retrieval problem of finding images in a large dataset that contain similar scenes or objects to a given query image. Often, this task is performed with the popular Bag-of-Words (BoW)-Model which quantizes local features such as SIFT for speeding up the retrieval by using an inverted file indexing scheme. We focus on the limits of the model for very large-scale datasets since the quantization of the individual feature descriptors impairs their discriminative power. Thus, with growing datasets, the model gets increasingly distracted by irrelevant images that occasionally result in similar signatures. Our goal is to also consider neighboring features and their geometry and to condense them into a new context-feature which is meant to be quantized as well. As this new quantized context information introduces a second dimension in the BoW-Model, it supports both performance and accuracy during the retrieval step. Using the public datasets Oxford5k and Holidays, we define an appropriate framework and evaluate different ways of context feature construction, dimensionality reduction and quantization.

[1]  Tsuhan Chen,et al.  Image retrieval with geometry-preserving visual phrases , 2011, CVPR 2011.

[2]  Shiliang Zhang,et al.  Multi-order visual phrase for scalable image search , 2013, ICIMCS '13.

[3]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Qi Tian,et al.  Packing and Padding: Coupled Multi-index for Accurate Image Retrieval , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[6]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[7]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[8]  Ying Wu,et al.  Object retrieval and localization with spatially-constrained similarity measure and k-NN re-ranking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Torsten Sattler,et al.  SCRAMSAC: Improving RANSAC's efficiency with a spatial consistency filter , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Andrew Zisserman,et al.  Fisher Vector Faces in the Wild , 2013, BMVC.

[11]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[12]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[15]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).