Object retrieval and localization with spatially-constrained similarity measure and k-NN re-ranking

One fundamental problem in object retrieval with the bag-of-visual words (BoW) model is its lack of spatial information. Although various approaches are proposed to incorporate spatial constraints into the BoW model, most of them are either too strict or too loose so that they are only effective in limited cases. We propose a new spatially-constrained similarity measure (SCSM) to handle object rotation, scaling, view point change and appearance deformation. The similarity measure can be efficiently calculated by a voting-based method using inverted files. Object retrieval and localization are then simultaneously achieved without post-processing. Furthermore, we introduce a novel and robust re-ranking method with the k-nearest neighbors of the query for automatically refining the initial search results. Extensive performance evaluations on six public datasets show that SCSM significantly outperforms other spatial models, while k-NN re-ranking outperforms most state-of-the-art approaches using query expansion.

[1]  Jiri Matas,et al.  Efficient representation of local geometry for large scale object retrieval , 2009, CVPR.

[2]  Tsuhan Chen,et al.  Image retrieval with geometry-preserving visual phrases , 2011, CVPR 2011.

[3]  Luc Van Gool,et al.  Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors , 2011, CVPR 2011.

[4]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[5]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Jiri Matas,et al.  Learning a Fine Vocabulary , 2010, ECCV.

[7]  O. Chum,et al.  Geometric min-Hashing: Finding a (thick) needle in a haystack , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9]  Yannis Avrithis,et al.  Speeded-up, relaxed spatial matching , 2011, 2011 International Conference on Computer Vision.

[10]  Ming Yang,et al.  Contextual weighting for vocabulary tree based image retrieval , 2011, 2011 International Conference on Computer Vision.

[11]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[12]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[13]  Jiri Matas,et al.  Unsupervised discovery of co-occurrence in sparse high dimensional data , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[15]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Zhe L. Lin,et al.  A Local Bag-of-Features Model for Large-Scale Object Retrieval , 2010, ECCV.

[17]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[19]  Jiri Matas,et al.  Total recall II: Query expansion revisited , 2011, CVPR 2011.

[20]  Ricardo da Silva Torres,et al.  Exploiting contextual spaces for image re-ranking and rank aggregation , 2011, ICMR.

[21]  Michael Isard,et al.  Descriptor Learning for Efficient Retrieval , 2010, ECCV.

[22]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Changhu Wang,et al.  Spatial-bag-of-features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Jiri Matas,et al.  Geometric min-Hashing: Finding a (thick) needle in a haystack , 2009, CVPR.

[25]  Cordelia Schmid,et al.  A contextual dissimilarity measure for accurate and efficient image search , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Christoph H. Lampert Detecting objects in large image collections and videos by efficient subimage retrieval , 2009, 2009 IEEE 12th International Conference on Computer Vision.