Embedding spatial context information into inverted filefor large-scale image retrieval

One most popular approach for large-scale content-based image retrieval is based on the Bag-of-Visual-Words model. Since the spatial context among local features is very important for visual content identification, many approaches index local features' geometric clues, such as location, scale and orientation for post-verification. To obtain consistent accuracy performance, the amount of top ranked images that post-verification approach needs to process is proportional to the image database size. When the database is very large, the verified images will be too many to be processed in real-time response. To address this issue, in this paper, we explore two approaches to embed spatial context information into the inverted file. The first one is to build a spatial relationship dictionary embedded with spatial context among local features, which we call one-one spatial relationship method. The second one is to generate a spatial context binary signature for each feature, which we call one-multiple spatial relationship method. Then we build an inverted file with spatial information between local features. The geometric verification is implicitly achieved while traversing the inverted file. Experimental results on benchmark Holidays dataset demonstrate the efficiency of the proposed algorithm.

[1]  Gang Hua,et al.  Integrated feature selection and higher-order spatial feature extraction for object categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Gang Hua,et al.  Building contextual visual vocabulary for large-scale image applications , 2010, ACM Multimedia.

[3]  Changhu Wang,et al.  Spatial-bag-of-features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Jiri Matas,et al.  Geometric min-Hashing: Finding a (thick) needle in a haystack , 2009, CVPR.

[5]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[6]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[10]  Gang Hua,et al.  Generating Descriptive Visual Words and Visual Phrases for Large-Scale Image Applications , 2011, IEEE Transactions on Image Processing.

[11]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[12]  Ruigang Yang,et al.  Unsupervised learning of high-order structural semantics from images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[14]  Gang Hua,et al.  Descriptive visual words and visual phrases for image applications , 2009, ACM Multimedia.

[15]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[16]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Tsuhan Chen,et al.  Image retrieval with geometry-preserving visual phrases , 2011, CVPR 2011.

[18]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[19]  Qi Tian,et al.  Large scale image search with geometric coding , 2011, ACM Multimedia.

[20]  Qi Tian,et al.  Spatial coding for large scale partial-duplicate web image search , 2010, ACM Multimedia.

[21]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, CVPR.

[22]  O. Chum,et al.  ENHANCING RANSAC BY GENERALIZED MODEL OPTIMIZATION Onďrej Chum, Jǐ , 2003 .

[23]  Ming Yang,et al.  Contextual weighting for vocabulary tree based image retrieval , 2011, 2011 International Conference on Computer Vision.