Two-layer video fingerprinting strategy for near-duplicate video detection

Due to the importance of visual similarity for near-duplicate video detection, visual features are the primary features for video fingerprint generation. However, the mutual assistance between visual features and the discriminatory power of the semantic features of videos have not been well explored in video fingerprinting. To address these issues, two layers of video fingerprints are proposed in this paper. We first generate a Low-level Representation Fingerprint (LRF) from handcrafted visual features using a tensor-based model, which can well explore the mutual relations among the multiple visual features. Next, we use a Convolutional Neural Networks model to learn deep semantic features to generate a Deep Representation Fingerprint (DRF) to provide heterogeneity assistance to the LRF. As a result, both the mutual relations among multiple handcrafted visual features and the assistance from semantic feature are used in the video fingerprinting system. During the matching stage, a DRF matching followed by a LRF matching is performed. Experimental results show that the proposed method provides a superior performance compared to approaches that use the techniques individually.

[1]  Paul Over,et al.  Content-Based Video Copy Detection Benchmarking at TRECVID , 2014, ACM Trans. Inf. Syst..

[2]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[3]  Xi Wang,et al.  Multi-Stream Multi-Class Fusion of Deep Networks for Video Classification , 2016, ACM Multimedia.

[4]  Yilong Yin,et al.  Comprehensive Feature-Based Robust Video Fingerprinting Using Tensor Model , 2016, IEEE Transactions on Multimedia.

[5]  Chang Dong Yoo,et al.  Robust video fingerprinting for content-based video identification , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  A. Floren,et al.  ' " ' " ' " . " ' " " " " " ' " ' " " " " " : ' " 1 , 2001 .

[7]  Ning Chen,et al.  A robust hashing algorithm based on SURF for video copy detection , 2012, Comput. Secur..

[8]  Neslihan Serap Sengör,et al.  Content-based copy detection by a subspace learning based video fingerprinting scheme , 2012, Multimedia Tools and Applications.

[9]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[10]  Zi Huang,et al.  Effective Multiple Feature Hashing for Large-Scale Near-Duplicate Video Retrieval , 2013, IEEE Transactions on Multimedia.

[11]  Wen Gao,et al.  Content-based copy detection through multimodal feature representation and temporal pyramid matching , 2013, ACM Trans. Multim. Comput. Commun. Appl..

[12]  Sharath Pankanti,et al.  Heterogeneous Semantic Level Features Fusion for Action Recognition , 2015, ICMR.

[13]  Vishal Monga,et al.  Robust Video Hashing via Multilinear Subspace Projections , 2012, IEEE Transactions on Image Processing.

[14]  Nasir D. Memon,et al.  Spatio–Temporal Transform Based Video Hashing , 2006, IEEE Transactions on Multimedia.

[15]  Yao Zhao,et al.  Frame Fusion for Video Copy Detection , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Tiejun Huang,et al.  Video Copy Detection Using a Soft Cascade of Multimodal Features , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  Meng Wang,et al.  Stochastic Multiview Hashing for Large-Scale Near-Duplicate Video Retrieval , 2017, IEEE Transactions on Multimedia.

[19]  L. K. Hansen,et al.  Automatic relevance determination for multi‐way models , 2009 .