Design and evaluation of an effective and efficient video copy detection system

We consider the end-to-end system design and evaluation of an efficient and effective system for video copy detection that bridges the gap between computationally expensive methods and practical applications. We use a compact SIFT-based bag-of-words fingerprint (which we call a SIFTogram), requiring only 1000 bytes per second of video, and show that beyond the descriptor choice, many variables can affect performance. We also consider a complementary color-based descriptor, which contrary to popular recent belief, performs better than SIFTogram on some transforms. We emphasize robustness with respect to the most common transformations on content sharing sites, and report a 99.3% detection rate with 0 false alarms on one such transform category from a standardized evaluation. We perform an evaluation of the system using two TRECVID benchmark datasets, and examine the trade-off between speed and accuracy relative to other TRECVID submissions.

[1]  Li Chen,et al.  Video copy detection: a comparative study , 2007, CIVR '07.

[2]  Chong-Wah Ngo,et al.  Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval , 2009, Comput. Vis. Image Underst..

[3]  Alexei A. Efros,et al.  Discovering object categories in image collections , 2005 .

[4]  Shih-Fu Chang,et al.  Query-Adaptive Fusion for Multimodal Search , 2008, Proceedings of the IEEE.

[5]  Cordelia Schmid,et al.  INRIA-LEAR'S Video Copy Detection System , 2008, TRECVID.

[6]  Paul Over,et al.  TRECVID-2008 content-based copy detection task overview (slides) , 2008 .

[7]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[8]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[9]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Hung-Khoon Tan,et al.  Scalable detection of partial near-duplicate videos by visual-temporal consistency , 2009, ACM Multimedia.

[11]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[12]  Jing Huang,et al.  Spatial Color Indexing and Applications , 2004, International Journal of Computer Vision.

[13]  Ruud M. Bolle,et al.  Comparison of sequence matching techniques for video copy detection , 2001, IS&T/SPIE Electronic Imaging.

[14]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.