Multi-order visual phrase for scalable image search

Visual phrase captures extra spatial clues among single visual words, thus shows better discriminative power than single visual word in image retrieval. Not withstanding their success, existing visual phrases still show obvious shortcomings: 1) limited flexibility, i.e., visual phrases are considered for matching only if they contain the same number of visual words; 2) larger quantization error and low repeatability, i.e., quantization errors in visual words are aggregated in visual phrases, making them harder to be matched than single visual words. To avoid these issues, we propose multi-order visual phrase which contains two complementary clues: center visual word quantized from the local descriptor of each image keypoint and the visual and spatial clues of multiple nearby keypoints. Two multi-order visual phrases are flexibly matched by first comparing their center visual words, then estimating a match confidence by checking the spatial and visual consistency of their neighbor keypoints. Therefore, multi-order visual phrase does not scarify the repeatability of single visual word and is more robust to quantization error than existing visual phrases. We test multi-order visual phrase on UKbench, Oxford5K, and 1 million distractor images collected from Flickr. Comparisons with recent retrieval approaches clearly demonstrate the competitive accuracy and significantly better efficiency of multi-order visual phrase.

[1]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[2]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Ming Yang,et al.  Contextual weighting for vocabulary tree based image retrieval , 2011, 2011 International Conference on Computer Vision.

[5]  Tsuhan Chen,et al.  Image retrieval with geometry-preserving visual phrases , 2011, CVPR 2011.

[6]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[9]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[10]  Gang Hua,et al.  Building contextual visual vocabulary for large-scale image applications , 2010, ACM Multimedia.

[11]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[12]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[14]  Qi Tian,et al.  Spatial coding for large scale partial-duplicate web image search , 2010, ACM Multimedia.

[15]  Ying Wu,et al.  Object retrieval and localization with spatially-constrained similarity measure and k-NN re-ranking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[17]  Gang Hua,et al.  Descriptive visual words and visual phrases for image applications , 2009, ACM Multimedia.