Contextual Query Expansion for Image Retrieval

In this paper, we study the problem of image retrieval by introducing contextual query expansion to address the shortcomings of bag-of-words based frameworks: semantic gap of visual word quantization, and the efficiency and storage loss due to query expansion. Our method is built on common visual patterns (CVPs), which are the distinctive visual structures between two images and have rich contextual information. With CVPs, two contextual query expansions on visual word-level and image-level are explored, respectively. For visual word-level expansion, we find contextual synonymous visual words (CSVWs) and expand a word in the query image with its CSVWs to boost retrieval accuracy. CSVWs are the words that appear in the same CVPs and have same contextual meaning, i.e. similar spatial layout and geometric transformations. For image-level expansion, the database images that have the same CVPs are organized by linked list and the images that have the same CVPs as the query image, but not included in the results are automatically expanded. The main computation of these two expansions is carried out offline, and they can be integrated into the inverted file and efficiently applied to all images in the dataset. Experiments conducted on three reference datasets and a dataset of one million images demonstrate the effectiveness and efficiency of our method.

[1]  Ying Wu,et al.  Spatial Random Partition for Common Visual Pattern Discovery , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[2]  Jiri Matas,et al.  Learning a Fine Vocabulary , 2010, ECCV.

[3]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[4]  Qi Tian,et al.  Scalar quantization for large scale image search , 2012, ACM Multimedia.

[5]  Yongdong Zhang,et al.  Common visual pattern discovery via graph matching , 2011, ACM Multimedia.

[6]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Wen Wu,et al.  Object fingerprints for content analysis with applications to street landmark localization , 2008, ACM Multimedia.

[8]  Andrew Zisserman,et al.  Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets , 2011, International Journal of Computer Vision.

[9]  Tat-Seng Chua,et al.  Image Annotation by Graph-Based Inference With Integrated Multiple/Single Instance Representations , 2010, IEEE Transactions on Multimedia.

[10]  Yu Liu,et al.  Bregman Iteration Based Efficient Algorithm for MR Image Reconstruction From Undersampled K-Space Data , 2013, IEEE Signal Processing Letters.

[11]  Yongdong Zhang,et al.  Salient region detection for complex background images using integrated features , 2014, Inf. Sci..

[12]  Marcello Pelillo,et al.  Dominant Sets and Pairwise Clustering , 2007 .

[13]  Meng Wang,et al.  Visual query suggestion , 2009, ACM Multimedia.

[14]  Qi Tian,et al.  SIFT match verification by geometric coding for large-scale partial-duplicate web image search , 2013, TOMCCAP.

[15]  Pietro Perona,et al.  Entropy-based active learning for object recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[16]  Sheng Tang,et al.  Efficient Feature Detection and Effective Post-Verification for Large Scale Near-Duplicate Image Search , 2011, IEEE Transactions on Multimedia.

[17]  C. Cannings,et al.  Evolutionary Game Theory , 2010 .

[18]  Qi Tian,et al.  Spatial coding for large scale partial-duplicate web image search , 2010, ACM Multimedia.

[19]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[20]  Ximena Olivares,et al.  Boosting image retrieval through aggregating search results based on visual annotations , 2008, ACM Multimedia.

[21]  Shuicheng Yan,et al.  Robust Graph Mode Seeking by Graph Shift , 2010, ICML.

[22]  Tinne Tuytelaars,et al.  Effective Use of Frequent Itemset Mining for Image Classification , 2012, ECCV.

[23]  Andrew Zisserman,et al.  Efficient Visual Search of Videos Cast as Text Retrieval , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Cees Snoek,et al.  Landmark image retrieval using visual synonyms , 2010, ACM Multimedia.

[25]  Luc Van Gool,et al.  Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors , 2011, CVPR 2011.

[26]  Zhiwei Li,et al.  Contextual synonym dictionary for visual object retrieval , 2011, ACM Multimedia.

[27]  Shih-Fu Chang,et al.  Detecting image near-duplicate by stochastic attributed relational graph matching with learning , 2004, MULTIMEDIA '04.

[28]  Chong-Wah Ngo,et al.  On the Annotation of Web Videos by Efficient Near-Duplicate Search , 2010, IEEE Transactions on Multimedia.

[29]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[30]  Ming-Syan Chen,et al.  A New Approach to Image Copy Detection Based on Extended Feature Sets , 2007, IEEE Transactions on Image Processing.

[31]  Winston H. Hsu,et al.  Query expansion for hash-based image object retrieval , 2009, ACM Multimedia.

[32]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[33]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[34]  Yi-Hsuan Yang,et al.  Unsupervised auxiliary visual words discovery for large-scale image object retrieval , 2011, CVPR 2011.

[35]  Jiri Matas,et al.  Total recall II: Query expansion revisited , 2011, CVPR 2011.

[36]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[38]  Bingbing Ni,et al.  Building descriptive and discriminative visual codebook for large-scale image applications , 2010, Multimedia Tools and Applications.

[39]  Michael Isard,et al.  Descriptor Learning for Efficient Retrieval , 2010, ECCV.

[40]  Olivier Buisson,et al.  Logo retrieval with a contrario visual query expansion , 2009, ACM Multimedia.

[41]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Tsuhan Chen,et al.  Image retrieval with geometry-preserving visual phrases , 2011, CVPR 2011.