Small Vocabulary with Saliency Matching for Video Copy Detection

The importance of copy detection has led to a substantial amount of research in recent years, among which Bag of visual Words (BoW) plays an important role due to its ability to effectively handling occlusion and some minor transformations. One crucial issue in BoW approaches is the size of vocabulary. BoW descriptors under a small vocabulary can be both robust and efficient, while keeping high recall rate compared with large vocabulary. However, the high false positives exists in small vocabulary also limits its application. To address this problem in small vocabulary, we propose a novel matching algorithm based on salient visual words selection. More specifically, the variation of visual words across a given video are represented as trajectories and those containing locally asymptotically stable points are selected as salient visual words. Then we attempt to measure the similarity of two videos through saliency matching merely based on the selected salient visual words to remove false positives. Our experiments show that a small codebook with saliency matching is quite competitive in video copy detection. With the incorporation of the proposed saliency matching, the precision can be improved by 30% on average compared with the state-of-the-art technique. Moreover, our proposed method is capable of detecting severe transformations, e.g. picture in picture and post production.

[1]  Gang Hua,et al.  Integrated feature selection and higher-order spatial feature extraction for object categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Lei Wang Toward A Discriminative Codebook: Codeword Selection across Multi-resolution , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  B. Vasudev,et al.  Spatiotemporal sequence matching for efficient video copy detection , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Olivier Buisson,et al.  Scaling content-based video copy detection to very large databases , 2009, Multimedia Tools and Applications.

[5]  Chong-Wah Ngo,et al.  Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study , 2010, IEEE Transactions on Multimedia.

[6]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Cordelia Schmid,et al.  Compact Video Description for Copy Detection with Precise Temporal Alignment , 2010, ECCV.

[8]  Yongdong Zhang,et al.  An Incremental Clustering based codebook construction in video copy detection , 2012, 2012 IEEE Southwest Symposium on Image Analysis and Interpretation.

[9]  Chun Chen,et al.  Discriminative codeword selection for image representation , 2010, ACM Multimedia.

[10]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Sid-Ahmed Berrani,et al.  A probabilistic framework for fusing frame-based searches within a video copy detection system , 2008, CIVR '08.

[13]  Li Chen,et al.  Video copy detection: a comparative study , 2007, CIVR '07.

[14]  Xian-Sheng Hua,et al.  Large-scale robust visual codebook construction , 2010, ACM Multimedia.

[15]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[16]  Rong Jin,et al.  Online visual vocabulary pruning using pairwise constraints , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.