Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors

This paper introduces a simple yet effective method to improve visual word based image retrieval. Our method is based on an analysis of the k-reciprocal nearest neighbor structure in the image space. At query time the information obtained from this process is used to treat different parts of the ranked retrieval list with different distance measures. This leads effectively to a re-ranking of retrieved images. As we will show, this has two benefits: first, using different similarity measures for different parts of the ranked list allows for compensation of the “curse of dimensionality”. Second, it allows for dealing with the uneven distribution of images in the data space. Dealing with both challenges has very beneficial effect on retrieval accuracy. Furthermore, a major part of the process happens offline, so it does not affect speed at retrieval time. Finally, the method operates on the bag-of-words level only, thus it could be combined with any additional measures on e.g. either descriptor level or feature geometry making room for further improvement. We evaluate our approach on common object retrieval benchmarks and demonstrate a significant improvement over standard bag-of-words retrieval.

[1]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[5]  Jiri Matas,et al.  Learning a Fine Vocabulary , 2010, ECCV.

[6]  Michael Isard,et al.  Descriptor Learning for Efficient Retrieval , 2010, ECCV.

[7]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[8]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[9]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[10]  Cordelia Schmid,et al.  Accurate Image Search Using the Contextual Dissimilarity Measure , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[12]  Jiri Matas,et al.  Unsupervised discovery of co-occurrence in sparse high dimensional data , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Cordelia Schmid,et al.  A contextual dissimilarity measure for accurate and efficient image search , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Andrew Zisserman,et al.  Near Duplicate Image Detection: min-Hash and tf-idf Weighting , 2008, BMVC.