Effective near-duplicate image retrieval with image-specific visual phrase selection

Near-duplicate image retrieval (NDIR) is an important topic for many applications such as multimedia content management, copyright infringement identification et al. In this work we propose a novel NDIR framework based on visual phrase. Compared with previous researches, this paper first introduces a spatial visual phrase (SVP) model enabling to capture relative geometry information between visual words. Then, it proposes an image-specific strategy to select descriptive SVPs. The strategy can not only handle the phrase sparseness problem which occurs in traditional selection strategy but also allow to select visual phrases according to the characteristic of each image. Experiments are carried out over Ukbench dataset and TRECVID dataset respectively, and encouraging experimental results demonstrate that both the SVP model and the selection strategy significantly improve the overall performance.

[1]  Ming Yang,et al.  Contextual weighting for vocabulary tree based image retrieval , 2011, 2011 International Conference on Computer Vision.

[2]  Gang Hua,et al.  Descriptive visual words and visual phrases for image applications , 2009, ACM Multimedia.

[3]  Qi Tian,et al.  Visual Synset: Towards a higher-level visual representation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[5]  Cordelia Schmid,et al.  A contextual dissimilarity measure for accurate and efficient image search , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Tsuhan Chen,et al.  Image retrieval with geometry-preserving visual phrases , 2011, CVPR 2011.

[7]  Gang Hua,et al.  Integrated feature selection and higher-order spatial feature extraction for object categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Gang Hua,et al.  Building contextual visual vocabulary for large-scale image applications , 2010, ACM Multimedia.

[9]  Ming Yang,et al.  Discovery of Collocation Patterns: from Visual Words to Visual Phrases , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Wen Gao,et al.  Constructing visual phrases for effective and efficient object-based image retrieval , 2008, TOMCCAP.

[14]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.