Coherent Phrase Model for Efficient Image Near-Duplicate Retrieval

This paper presents an efficient and effective solution for retrieving image near-duplicate (IND) from image database. We introduce the coherent phrase model which incorporates the coherency of local regions to reduce the quantization error of the bag-of-words (BoW) model. In this model, local regions are characterized by visual phrase of multiple descriptors instead of visual word of single descriptor. We propose two types of visual phrase to encode the coherency in feature and spatial domain, respectively. The proposed model reduces the number of false matches by using this coherency and generates sparse representations of images. Compared to other method, the local coherencies among multiple descriptors of every region improve the performance and preserve the efficiency for IND retrieval. The proposed method is evaluated on several benchmark datasets for IND retrieval. Compared to the state-of-the-art methods, our proposed model has been shown to significantly improve the accuracy of IND retrieval while maintaining the efficiency of the standard bag-of-words model. The proposed method can be integrated with other extensions of BoW.

[1]  Wen Gao,et al.  Effective and efficient object-based image retrieval using visual phrases , 2006, MM '06.

[2]  John J. Lee,et al.  LIBPMK: A Pyramid Match Toolkit , 2008 .

[3]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[4]  Chong-Wah Ngo,et al.  Keyframe Retrieval by Keypoints: Can Point-to-Point Matching Help? , 2006, CIVR.

[5]  Shuicheng Yan,et al.  Near-duplicate keyframe retrieval by nonrigid image matching , 2008, ACM Multimedia.

[6]  Paul Over,et al.  TREC video retrieval evaluation TRECVID , 2008 .

[7]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[8]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Fei-Fei Li,et al.  Spatially Coherent Latent Topic Model for Concurrent Segmentation and Classification of Objects and Scenes , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Ming Yang,et al.  Discovery of Collocation Patterns: from Visual Words to Visual Phrases , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Yan Ke,et al.  An efficient parts-based near-duplicate and sub-image retrieval system , 2004, MULTIMEDIA '04.

[13]  S. Lazebnik,et al.  Local Features and Kernels for Classification of Texture and Object Categories: An In-Depth Study , 2005 .

[14]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  Li Fei-Fei,et al.  Spatially coherent latent topic model for concurrent object segmentation and classification , 2007 .

[17]  Cordelia Schmid,et al.  A sparse texture representation using local affine regions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[20]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[21]  B. S. Manjunath,et al.  Duplicate Image Detection in Large Scale Databases , 2007 .

[22]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[23]  Yan Ke,et al.  Efficient Near-duplicate Detection and Sub-image Retrieval , 2004 .

[24]  Shih-Fu Chang,et al.  Detecting image near-duplicate by stochastic attributed relational graph matching with learning , 2004, MULTIMEDIA '04.

[25]  Luc Van Gool,et al.  Efficient Mining of Frequent and Distinctive Feature Configurations , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[26]  Nenghai Yu,et al.  Visual language modeling for image classification , 2007, MIR '07.

[27]  Bruce Allen Maxwell Towards object-based retrieval for image libraries , 2001, Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL 2001).

[28]  Trevor Darrell,et al.  Approximate Correspondences in High Dimensions , 2006, NIPS.

[29]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  Hung-Khoon Tan,et al.  Near-Duplicate Keyframe Identification With Interest Point Matching and Pattern Learning , 2007, IEEE Transactions on Multimedia.

[31]  Svetlana Lazebnik,et al.  Supervised Learning of Quantizer Codebooks by Information Loss Minimization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Andrew Zisserman,et al.  Video data mining using configurations of viewpoint invariant regions , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[33]  Stefano Soatto,et al.  Proximity Distribution Kernels for Geometric Context in Category Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[34]  Dong Xu,et al.  Visual Event Recognition in News Video using Kernel Methods with Multi-Level Temporal Alignment , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[36]  Silvio Savarese,et al.  Discriminative Object Class Models of Appearance and Shape by Correlatons , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[37]  Chong-Wah Ngo,et al.  Fast tracking of near-duplicate keyframes in broadcast domain with transitivity propagation , 2006, MM '06.