Spatially-Constrained Similarity Measurefor Large-Scale Object Retrieval

One fundamental problem in object retrieval with the bag-of-words model is its lack of spatial information. Although various approaches are proposed to incorporate spatial constraints into the model, most of them are either too strict or too loose so that they are only effective in limited cases. In this paper, a new spatially-constrained similarity measure (SCSM) is proposed to handle object rotation, scaling, view point change and appearance deformation. The similarity measure can be efficiently calculated by a voting-based method using inverted files. During the retrieval process, object localization in the database images can also be simultaneously achieved using SCSM without post-processing. Furthermore, based on the retrieval and localization results of SCSM, we introduce a novel and robust re-ranking method with the k-nearest neighbors of the query for automatically refining the initial search results. Extensive performance evaluations on six public data sets show that SCSM significantly outperforms other spatial models including RANSAC-based spatial verification, while k-NN re-ranking outperforms most state-of-the-art approaches using query expansion. We also adapted SCSM for mobile product image search with an iterative algorithm to simultaneously extract the product instance from the mobile query image, identify the instance, and retrieve visually similar product images. Experiments on two product image search data sets show that our approach can robustly localize and extract the product in the query image, and hence drastically improve the retrieval accuracy over baseline methods.

[1]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  Cordelia Schmid,et al.  A contextual dissimilarity measure for accurate and efficient image search , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[5]  Christoph H. Lampert Detecting objects in large image collections and videos by efficient subimage retrieval , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Shih-Fu Chang,et al.  Mobile product search with Bag of Hash Bits and boundary reranking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Yannis Avrithis,et al.  Speeded-up, relaxed spatial matching , 2011, 2011 International Conference on Computer Vision.

[8]  Ming Yang,et al.  Contextual weighting for vocabulary tree based image retrieval , 2011, 2011 International Conference on Computer Vision.

[9]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[10]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[11]  Shumeet Baluja,et al.  Pagerank for product image search , 2008, WWW.

[12]  Michael Isard,et al.  Descriptor Learning for Efficient Retrieval , 2010, ECCV.

[13]  Cordelia Schmid,et al.  On the burstiness of visual elements , 2009, CVPR.

[14]  Luc Van Gool,et al.  Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors , 2011, CVPR 2011.

[15]  Ying Wu,et al.  Mobile Product Image Search by Automatic Query Object Extraction , 2012, ECCV.

[16]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Ying Wu,et al.  Object retrieval and localization with spatially-constrained similarity measure and k-NN re-ranking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Changhu Wang,et al.  Spatial-bag-of-features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Jiri Matas,et al.  Learning a Fine Vocabulary , 2010, ECCV.

[21]  O. Chum,et al.  Geometric min-Hashing: Finding a (thick) needle in a haystack , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Jiri Matas,et al.  Unsupervised discovery of co-occurrence in sparse high dimensional data , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Tsuhan Chen,et al.  Image retrieval with geometry-preserving visual phrases , 2011, CVPR 2011.

[24]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[25]  Bernd Girod,et al.  Mobile Visual Search , 2011, IEEE Signal Processing Magazine.

[26]  Jiri Matas,et al.  Efficient representation of local geometry for large scale object retrieval , 2009, CVPR.

[27]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Huizhong Chen,et al.  The stanford mobile visual search data set , 2011, MMSys.

[30]  Xiaofan Lin,et al.  Visual search engine for product images , 2008, Electronic Imaging.

[31]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[32]  Zhe L. Lin,et al.  A Local Bag-of-Features Model for Large-Scale Object Retrieval , 2010, ECCV.

[33]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[35]  Jiri Matas,et al.  Total recall II: Query expansion revisited , 2011, CVPR 2011.

[36]  Ricardo da Silva Torres,et al.  Exploiting contextual spaces for image re-ranking and rank aggregation , 2011, ICMR.

[37]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..