Supervised Recurrent Hashing for Large Scale Video Retrieval

Hashing for large-scale multimedia is a popular research topic, attracting much attention in computer vision and visual information retrieval. Previous works mostly focus on hashing the images and texts while the approaches designed for videos are limited. In this paper, we propose a \textit{Supervised Recurrent Hashing} (SRH) that explores the discriminative representation obtained by deep neural networks to design hashing approaches. The long-short term memory (LSTM) network is deployed to model the structure of video samples. The max-pooling mechanism is introduced to embedding the frames into fixed-length representations that are fed into supervised hashing loss. Experiments on UCF-101 dataset demonstrate that the proposed method can significantly outperforms several state-of-the-art methods.

[1]  Shih-Fu Chang,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[3]  Dong Liu,et al.  Large-Scale Video Hashing via Structure Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[5]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[6]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[7]  Zi Huang,et al.  Multiple feature hashing for real-time large scale near-duplicate video retrieval , 2011, ACM Multimedia.

[8]  Pengfei Shi,et al.  Cross-modality hashing with partial correspondence , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[9]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Shih-Fu Chang,et al.  Submodular video hashing: a unified framework towards video pooling and indexing , 2012, ACM Multimedia.

[11]  Nikos Paragios,et al.  Data fusion through cross-modality metric learning using similarity-sensitive hashing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[14]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[15]  WangJun,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012 .

[16]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[17]  Yi Zhen,et al.  Co-Regularized Hashing for Multimodal Data , 2012, NIPS.

[18]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[19]  Raghavendra Udupa,et al.  Learning Hash Functions for Cross-View Similarity Search , 2011, IJCAI.