Shot aggregating strategy for near-duplicate video retrieval

In this paper, we propose a new strategy for near-duplicate video retrieval that is based on shot aggregation. We investigate different methods for shot aggregation with the main objective to solve the difficult trade-off between performance, scalability and speed. The proposed short aggregation is based on two steps. The first step consists of keyframes selection. And the second one is the aggregation of the keyframes per shot. The aggregation is performed by applying Fisher vector on the descriptors computed on the selected keyframes. We demonstrate that the scalability and the speed are tackled by a sparse video analysis approach (i.e. extracting only few keyframes) combined with shot aggregation, while the performance is discussed around the choice of the aggregation strategy. The performance is evaluated on the CC_WEB_VIDEO dataset that is designed for the near-duplicate video retrieval assessment and for which some experiments have been conducted by different authors.

[1]  C. De Vleeschouwer,et al.  Robust video hashing based on radial projections of key frames , 2005, IEEE Transactions on Signal Processing.

[2]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Fei Wang,et al.  Real-time large scale near-duplicate web video retrieval , 2010, ACM Multimedia.

[4]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[5]  Zi Huang,et al.  Correlation-based retrieval for heavily changed near-duplicate videos , 2011, TOIS.

[6]  Yao Zhao,et al.  Frame Fusion for Video Copy Detection , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Song Tan,et al.  Large-scale near-duplicate web video search: Challenge and opportunity , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[9]  Gang Hua,et al.  IBM Research TRECVID-2010 Video Copy Detection and Multimedia Event Detection System , 2010, TRECVID.

[10]  Ieee Staff 2017 25th European Signal Processing Conference (EUSIPCO) , 2017 .

[11]  Jean-Didier Legat,et al.  RASH: RAdon soft hash algorithm , 2002, 2002 11th European Signal Processing Conference.

[12]  C. Schmid,et al.  Exploiting descriptor distances for precise image search , 2011 .

[13]  Cordelia Schmid,et al.  Compact Video Description for Copy Detection with Precise Temporal Alignment , 2010, ECCV.

[14]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[15]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[16]  Nuria Oliver,et al.  Understanding near-duplicate videos: a user-centric approach , 2009, ACM Multimedia.

[17]  Chien-Li Chou,et al.  Near-duplicate video retrieval by using pattern-based Prefix tree and temporal relation forest , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[18]  Zi Huang,et al.  Effective Multiple Feature Hashing for Large-Scale Near-Duplicate Video Retrieval , 2013, IEEE Transactions on Multimedia.

[19]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[20]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[21]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Chong-Wah Ngo,et al.  On the Annotation of Web Videos by Efficient Near-Duplicate Search , 2010, IEEE Transactions on Multimedia.

[23]  Jiajun Wang,et al.  VCDB: A Large-Scale Database for Partial Copy Detection in Videos , 2014, ECCV.

[24]  Chong-Wah Ngo,et al.  Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval , 2009, Comput. Vis. Image Underst..

[25]  Sudeep D. Thepade,et al.  An optimized key frame extraction for detection of near duplicates in content based video retrieval , 2014, 2014 International Conference on Communication and Signal Processing.

[26]  Cordelia Schmid,et al.  An Image-Based Approach to Video Copy Detection With Spatio-Temporal Post-Filtering , 2010, IEEE Transactions on Multimedia.

[27]  Cordelia Schmid,et al.  INRIA @TRECVID 2011: Copy Detection & Multimedia Event Detection , 2011, TRECVID.

[28]  Mubarak Shah,et al.  Content based video matching using spatiotemporal volumes , 2008, Comput. Vis. Image Underst..

[29]  Bertrand Chupeau,et al.  A Video Fingerprint Based on Visual Digest and Local Fingerprints , 2006, 2006 International Conference on Image Processing.

[30]  Chong-Wah Ngo,et al.  Practical elimination of near-duplicates from web video search , 2007, ACM Multimedia.

[31]  Zi Huang,et al.  UQLIPS: A Real-time Near-duplicate Video Clip Detection System , 2007, VLDB.

[32]  Olivier Buisson,et al.  Video and image copy detection demo , 2007, CIVR '07.