Stochastic Multiview Hashing for Large-Scale Near-Duplicate Video Retrieval

Near-duplicate video retrieval (NDVR) has been a significant research task in multimedia given its high impact in applications, such as video search, recommendation, and copyright protection. In addition to accurate retrieval performance, the exponential growth of online videos has imposed heavy demands on the efficiency and scalability of the existing systems. Aiming at improving both the retrieval accuracy and speed, we propose a novel stochastic multiview hashing algorithm to facilitate the construction of a large-scale NDVR system. Reliable mapping functions, which convert multiple types of keyframe features, enhanced by auxiliary information such as video-keyframe association and ground truth relevance to binary hash code strings, are learned by maximizing a mixture of the generalized retrieval precision and recall scores. A composite Kullback-Leibler divergence measure is used to approximate the retrieval scores, which aligns stochastically the neighborhood structures between the original feature and the relaxed hash code spaces. The efficiency and effectiveness of the proposed method are examined using two public near-duplicate video collections and are compared against various classical and state-of-the-art NDVR systems.

[1]  Athman Bouguettaya,et al.  An Efficient Near-Duplicate Video Shot Detection Method Using Shot-Based Interest Points , 2009, IEEE Transactions on Multimedia.

[2]  Anthony K. H. Tung,et al.  Multiple feature fusion for social media applications , 2010, SIGMOD Conference.

[3]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[4]  Meng Wang,et al.  Local voting based multi-view embedding , 2016, Neurocomputing.

[5]  Xian-Sheng Hua,et al.  Robust video signature based on ordinal measure , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[6]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Geoffrey E. Hinton,et al.  Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure , 2007, AISTATS.

[8]  Qi Tian,et al.  Fast and Robust Short Video Clip Search for Copy Detection , 2004, PCM.

[9]  Liujuan Cao,et al.  Localizing web videos using social images , 2015, Inf. Sci..

[10]  Nicu Sebe,et al.  Optimal graph learning with partial tags and multiple features for image and video annotation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Hung-Khoon Tan,et al.  Scalable detection of partial near-duplicate videos by visual-temporal consistency , 2009, ACM Multimedia.

[12]  Zhe Wang,et al.  Efficiently matching sets of features with random histograms , 2008, ACM Multimedia.

[13]  Yilong Yin,et al.  Comprehensive Feature-Based Robust Video Fingerprinting Using Tensor Model , 2016, IEEE Transactions on Multimedia.

[14]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[15]  Yan Ke,et al.  Efficient Near-duplicate Detection and Sub-image Retrieval , 2004 .

[16]  Jian Wang,et al.  Multi-feature fusion based fast video flame detection , 2010 .

[17]  Nicu Sebe,et al.  A Survey on Learning to Hash , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Cordelia Schmid,et al.  Compact Video Description for Copy Detection with Precise Temporal Alignment , 2010, ECCV.

[19]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[20]  Heng Tao Shen,et al.  Hashing for Similarity Search: A Survey , 2014, ArXiv.

[21]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[22]  Jian Yang,et al.  Feature fusion: parallel strategy vs. serial strategy , 2003, Pattern Recognit..

[23]  Nuria Oliver,et al.  Understanding near-duplicate videos: a user-centric approach , 2009, ACM Multimedia.

[24]  Zi Huang,et al.  Effective Multiple Feature Hashing for Large-Scale Near-Duplicate Video Retrieval , 2013, IEEE Transactions on Multimedia.

[25]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[26]  C. V. Jawahar,et al.  Multi-label Cross-Modal Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Olivier Buisson,et al.  Local Behaviours Labelling for Content Based Video Copy Detection , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[28]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Shiliang Sun,et al.  A survey of multi-view machine learning , 2013, Neural Computing and Applications.

[30]  Nicu Sebe,et al.  Supervised Hashing with Pseudo Labels for Scalable Multimedia Retrieval , 2015, ACM Multimedia.

[31]  Chong-Wah Ngo,et al.  Practical elimination of near-duplicates from web video search , 2007, ACM Multimedia.

[32]  Yao Hu,et al.  Iterative Multi-View Hashing for Cross Media Indexing , 2014, ACM Multimedia.

[33]  Chien-Li Chou,et al.  Pattern-Based Near-Duplicate Video Retrieval and Localization on Web-Scale Videos , 2015, IEEE Transactions on Multimedia.

[34]  Yun Fu,et al.  Multiple feature fusion by subspace learning , 2008, CIVR '08.

[35]  Zi Huang,et al.  UQLIPS: A Real-time Near-duplicate Video Clip Detection System , 2007, VLDB.

[36]  Ruud M. Bolle,et al.  Comparison of sequence matching techniques for video copy detection , 2001, IS&T/SPIE Electronic Imaging.

[37]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[38]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[39]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[40]  Lei Huang,et al.  Multi-View Complementary Hash Tables for Nearest Neighbor Search , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[41]  B. S. Manjunath,et al.  Multi-Label Learning With Fused Multimodal Bi-Relational Graph , 2014, IEEE Transactions on Multimedia.

[42]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[43]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[44]  Zi Huang,et al.  Linear cross-modal hashing for efficient multimedia search , 2013, ACM Multimedia.

[45]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[46]  Zi Huang,et al.  Multiple feature hashing for real-time large scale near-duplicate video retrieval , 2011, ACM Multimedia.

[47]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[48]  Hung-Khoon Tan,et al.  Real-Time Near-Duplicate Elimination for Web Video Search With Content and Context , 2009, IEEE Transactions on Multimedia.

[49]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[50]  Nicu Sebe,et al.  Compact Image Fingerprint Via Multiple Kernel Hashing , 2015, IEEE Transactions on Multimedia.

[51]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[52]  Changsheng Xu,et al.  Cross-Domain Feature Learning in Multimedia , 2015, IEEE Transactions on Multimedia.

[53]  Zi Huang,et al.  Multi-Feature Fusion via Hierarchical Regression for Multimedia Analysis , 2013, IEEE Transactions on Multimedia.

[54]  Li Chen,et al.  Video copy detection: a comparative study , 2007, CIVR '07.

[55]  Jingkuan Song,et al.  Scalable Multimedia Retrieval by Deep Learning Hashing with Relative Similarity Learning , 2015, ACM Multimedia.

[56]  Fumin Shen,et al.  Multi-view Latent Hashing for Efficient Multimedia Search , 2015, ACM Multimedia.

[57]  Zi Huang,et al.  Near-duplicate video retrieval: Current research and future trends , 2013, CSUR.

[58]  Jun Wang,et al.  Self-taught hashing for fast similarity search , 2010, SIGIR.

[59]  Jarkko Venna,et al.  Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization , 2010, J. Mach. Learn. Res..

[60]  Nikos Paragios,et al.  Data fusion through cross-modality metric learning using similarity-sensitive hashing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[61]  Chong-Wah Ngo,et al.  Fast tracking of near-duplicate keyframes in broadcast domain with transitivity propagation , 2006, MM '06.

[62]  Fei Wang,et al.  Real-time large scale near-duplicate web video retrieval , 2010, ACM Multimedia.