Three things everyone should know to improve object retrieval

The objective of this work is object retrieval in large scale image datasets, where the object is specified by an image query and retrieval should be immediate at run time in the manner of Video Google [28]. We make the following three contributions: (i) a new method to compare SIFT descriptors (RootSIFT) which yields superior performance without increasing processing or storage requirements; (ii) a novel method for query expansion where a richer model for the query is learnt discriminatively in a form suited to immediate retrieval through efficient use of the inverted index; (iii) an improvement of the image augmentation method proposed by Turcot and Lowe [29], where only the augmenting features which are spatially consistent with the augmented image are kept. We evaluate these three methods over a number of standard benchmark datasets (Oxford Buildings 5k and 105k, and Paris 6k) and demonstrate substantial improvements in retrieval performance whilst maintaining immediate retrieval speeds. Combining these complementary methods achieves a new state-of-the-art performance on these datasets.

[1]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[4]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[5]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[8]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[9]  Andrew Zisserman,et al.  Object Mining Using a Matching Graph on Very Large Image Collections , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[10]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[12]  Michael Werman,et al.  A Linear Time Histogram Metric for Improved SIFT Matching , 2008, ECCV.

[13]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Panu Turcot,et al.  Better matching with fewer features: The selection of useful features in large database recognition problems , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[15]  Matthew A. Brown,et al.  Picking the best DAISY , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Jiri Matas,et al.  Efficient representation of local geometry for large scale object retrieval , 2009, CVPR.

[17]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[18]  Ameesh Makadia,et al.  Feature Tracking for Wide-Baseline Image Retrieval , 2010, ECCV.

[19]  Florent Perronnin,et al.  Large-scale image categorization with explicit data embedding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Matthew Johnson,et al.  Generalized Descriptor Compression for Storage and Matching , 2010, BMVC.

[22]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Jiri Matas,et al.  Large-Scale Discovery of Spatially Related Images , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Jiri Matas,et al.  Learning a Fine Vocabulary , 2010, ECCV.

[25]  Michael Isard,et al.  Descriptor Learning for Efficient Retrieval , 2010, ECCV.

[26]  Tomás Pajdla,et al.  Avoiding Confusing Features in Place Recognition , 2010, ECCV.

[27]  Michael Werman,et al.  The Quadratic-Chi Histogram Distance Family , 2010, ECCV.

[28]  Hervé Jégou,et al.  Asymmetric Hamming Embedding , 2011, MM 2011.

[29]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[30]  Luc Van Gool,et al.  Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors , 2011, CVPR 2011.

[31]  Patrick Gros,et al.  Asymmetric hamming embedding: taking the best of our bits for large scale image search , 2011, ACM Multimedia.

[32]  Jiri Matas,et al.  Total recall II: Query expansion revisited , 2011, CVPR 2011.