Deep Video Hashing

In this work, we propose a deep video hashing (DVH) method for scalable video search. Unlike most existing video hashing methods that first extract features for each single frame and then use conventional image hashing techniques, our DVH learns binary codes for the entire video with a deep learning framework so that both the temporal and discriminative information can be well exploited. Specifically, we fuse the temporal information across different frames within each video to learn the feature representation under two criteria: the distance between a feature pair obtained at the top layer is small if they are from the same class, and large if they are from different classes; and the quantization loss between the real-valued features and the binary codes is minimized. We exploit different deep architectures to utilize spatial-temporal information in different manners and compare them with single-frame-based deep models and state-of-the-art image hashing methods. Experimental results demonstrate the effectiveness of our proposed method.

[1]  Heng Tao Shen,et al.  Hashing on Nonlinear Manifolds , 2014, IEEE Transactions on Image Processing.

[2]  Shih-Fu Chang,et al.  Query-Adaptive Image Search With Hash Codes , 2013, IEEE Transactions on Multimedia.

[3]  Shiguang Shan,et al.  Hierarchical hybrid statistic based video binary code and its application to face retrieval in TV-series , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[4]  Shih-Fu Chang,et al.  Submodular video hashing: a unified framework towards video pooling and indexing , 2012, ACM Multimedia.

[5]  Christian Wolf,et al.  Sequential Deep Learning for Human Action Recognition , 2011, HBU.

[6]  Yang Yang,et al.  Discriminant Cross-modal Hashing , 2016, ICMR.

[7]  Jianmin Wang,et al.  Deep Quantization Network for Efficient Image Retrieval , 2016, AAAI.

[8]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[9]  Xiaogang Wang,et al.  Slicing Convolutional Neural Network for Crowd Video Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Cordelia Schmid,et al.  Multi-region Two-Stream R-CNN for Action Detection , 2016, ECCV.

[12]  Shiguang Shan,et al.  Compact Video Code and Its Application to Robust Face Retrieval in TV-Series , 2014, BMVC.

[13]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Xi Wang,et al.  Evaluating Two-Stream CNN for Video Classification , 2015, ICMR.

[15]  James L. McClelland,et al.  Learning Subsequential Structure in Simple Recurrent Networks , 1988, NIPS.

[16]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[17]  Chien-Li Chou,et al.  Pattern-Based Near-Duplicate Video Retrieval and Localization on Web-Scale Videos , 2015, IEEE Transactions on Multimedia.

[18]  Jing Wang,et al.  A visual saliency based video hashing algorithm , 2012, 2012 19th IEEE International Conference on Image Processing.

[19]  Cordelia Schmid,et al.  An Image-Based Approach to Video Copy Detection With Spatio-Temporal Post-Filtering , 2010, IEEE Transactions on Multimedia.

[20]  Lei Huang,et al.  Query-Adaptive Hash Code Ranking for Large-Scale Multi-View Visual Search , 2016, IEEE Transactions on Image Processing.

[21]  Lei Zhang,et al.  Bit-Scalable Deep Hashing With Regularized Similarity Learning for Image Retrieval and Person Re-Identification , 2015, IEEE Transactions on Image Processing.

[22]  Shih-Fu Chang,et al.  Consumer video understanding: a benchmark database and an evaluation of human and machine performance , 2011, ICMR.

[23]  Gang Hua,et al.  Hash-SVM: Scalable Kernel Machines for Large-Scale Visual Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Jian Sun,et al.  K-Means Hashing: An Affinity-Preserving Quantization Method for Learning Binary Compact Codes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Jianmin Wang,et al.  Deep Hashing Network for Efficient Similarity Retrieval , 2016, AAAI.

[26]  Michael R. Lyu,et al.  A Multimodal and Multilevel Ranking Scheme for Large-Scale Video Retrieval , 2008, IEEE Transactions on Multimedia.

[27]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[28]  Zi Huang,et al.  Scalable Video Event Retrieval by Visual State Binary Embedding , 2016, IEEE Transactions on Multimedia.

[29]  Meng Wang,et al.  Deep Learning of Scene-Specific Classifier for Pedestrian Detection , 2014, ECCV.

[30]  Cordelia Schmid,et al.  P-CNN: Pose-Based CNN Features for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Tieniu Tan,et al.  Deep semantic ranking based hashing for multi-label image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Guiguang Ding,et al.  Latent semantic sparse hashing for cross-modal similarity search , 2014, SIGIR.

[33]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Shih-Fu Chang,et al.  Locally Linear Hashing for Extracting Non-linear Manifolds , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[36]  Martha Larson,et al.  Intent-Aware Video Search Result Optimization , 2014, IEEE Transactions on Multimedia.

[37]  Qi Tian,et al.  Towards Codebook-Free: Scalable Cascaded Hashing for Mobile Image Search , 2014, IEEE Transactions on Multimedia.

[38]  Fumin Shen,et al.  Inductive Hashing on Manifolds , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[40]  Nicu Sebe,et al.  A Survey on Learning to Hash , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Nasir D. Memon,et al.  Spatio–Temporal Transform Based Video Hashing , 2006, IEEE Transactions on Multimedia.

[42]  Jürgen Schmidhuber,et al.  Multimodal Similarity-Preserving Hashing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[45]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[47]  Jiwen Lu,et al.  Deep hashing for compact binary codes learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Jitendra Malik,et al.  Finding action tubes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[51]  Wen Gao,et al.  Weighted Component Hashing of Binary Aggregated Descriptors for Fast Visual Search , 2015, IEEE Transactions on Multimedia.

[52]  Meng Wang,et al.  Spectral Hashing With Semantically Consistent Graph for Image Indexing , 2013, IEEE Transactions on Multimedia.

[53]  David Suter,et al.  Fast Supervised Hashing with Decision Trees for High-Dimensional Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Li Li,et al.  A Survey on Visual Content-Based Video Indexing and Retrieval , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[55]  Jianmin Wang,et al.  Semantics-preserving hashing for cross-view retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Zhou Yu,et al.  Sparse Multi-Modal Hashing , 2014, IEEE Transactions on Multimedia.

[57]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[58]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[59]  Yi Yang,et al.  A discriminative CNN video representation for event detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[61]  Wei Liu,et al.  Discrete Graph Hashing , 2014, NIPS.

[62]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  Wen Gao,et al.  Learning to Distribute Vocabulary Indexing for Scalable Visual Search , 2013, IEEE Transactions on Multimedia.

[65]  Cordelia Schmid,et al.  Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[66]  Xuelong Li,et al.  Large-Scale Unsupervised Hashing with Shared Structure Learning , 2015, IEEE Transactions on Cybernetics.

[67]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[68]  Shih-Fu Chang,et al.  Semi-supervised hashing for scalable image retrieval , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[69]  Dacheng Tao,et al.  Multilinear Hyperplane Hashing , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Zi Huang,et al.  Effective Multiple Feature Hashing for Large-Scale Near-Duplicate Video Retrieval , 2013, IEEE Transactions on Multimedia.

[71]  Dong Liu,et al.  Large-Scale Video Hashing via Structure Learning , 2013, 2013 IEEE International Conference on Computer Vision.