INRIA-LEAR'S Video Copy Detection System

A video copy detection system is a content-based search engine [1]. It aims at deciding whether a query video segment is a copy of a video from the indexed dataset or not. A copy may be distorted in various ways. If the system finds a matching video segment, it returns the name of the database video and the time stamp where the query was copied from. Fig. 1 illustrates the video copyright detection system we have developed for the TRECVID 2008 evaluation campaign. The components of this system are detailed in Section 2. Most of them are derived from the state-of-the-art image search engine introduced in [2]. It builds upon the bag-of-features image search system proposed in [3], and provides a more precise representation by adding 1) a Hamming embedding and 2) weak geometric consistency constraints. The HE provides binary signatures that refine the visual word based matching. WGC filters matching descriptors that are not consistent in terms of angle and scale. HE and WGC are integrated within an inverted file and are efficiently exploited for all indexed frames, even for a very large dataset. In our best runs, we have indexed 2 million keyframes, represented by 800 million local descriptors. We give some conclusions drawn from our experiments in Section 3. Finally, in section 4 we briefly present our run for the high-level feature detection task.

[1]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[3]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[4]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[5]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[6]  Lixin Fan,et al.  Categorizing Nine Visual Classes using Local Appearance Descriptors , 2004 .

[7]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[8]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[10]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Li Chen,et al.  Video copy detection: a comparative study , 2007, CIVR '07.

[12]  Cordelia Schmid,et al.  A contextual dissimilarity measure for accurate and efficient image search , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[14]  Stéphane Ayache,et al.  Video Corpus Annotation Using Active Learning , 2008, ECIR.