Multiple feature hashing for real-time large scale near-duplicate video retrieval

Near-duplicate video retrieval (NDVR) has recently attracted lots of research attention due to the exponential growth of online videos. It helps in many areas, such as copyright protection, video tagging, online video usage monitoring, etc. Most of existing approaches use only a single feature to represent a video for NDVR. However, a single feature is often insufficient to characterize the video content. Besides, while the accuracy is the main concern in previous literatures, the scalability of NDVR algorithms for large scale video datasets has been rarely addressed. In this paper, we present a novel approach - Multiple Feature Hashing (MFH) to tackle both the accuracy and the scalability issues of NDVR. MFH preserves the local structure information of each individual feature and also globally consider the local structures for all the features to learn a group of hash functions which map the video keyframes into the Hamming space and generate a series of binary codes to represent the video dataset. We evaluate our approach on a public video dataset and a large scale video dataset consisting of 132,647 videos, which was collected from YouTube by ourselves. The experiment results show that the proposed method outperforms the state-of-the-art techniques in both accuracy and efficiency.

[1]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[2]  Olivier Buisson,et al.  A posteriori multi-probe locality sensitive hashing , 2008, ACM Multimedia.

[3]  Anthony K. H. Tung,et al.  Multiple feature fusion for social media applications , 2010, SIGMOD Conference.

[4]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[5]  Ivor W. Tsang,et al.  Flexible Manifold Embedding: A Framework for Semi-Supervised and Unsupervised Dimension Reduction , 2010, IEEE Transactions on Image Processing.

[6]  Zi Huang,et al.  Bounded coordinate system indexing for real-time video clip search , 2009, TOIS.

[7]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Ira Assent,et al.  Efficient EMD-based similarity search in multimedia databases via flexible dimensionality reduction , 2008, SIGMOD Conference.

[9]  Hung-Khoon Tan,et al.  Accelerating near-duplicate video matching by combining visual similarity and alignment distortion , 2008, ACM Multimedia.

[10]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[11]  Zi Huang,et al.  Correlation-based retrieval for heavily changed near-duplicate videos , 2011, TOIS.

[12]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Zi Huang,et al.  Practical Online Near-Duplicate Subsequence Detection for Continuous Video Streams , 2010, IEEE Transactions on Multimedia.

[14]  Hung-Khoon Tan,et al.  Real-Time Near-Duplicate Elimination for Web Video Search With Content and Context , 2009, IEEE Transactions on Multimedia.

[15]  Zi Huang,et al.  Online Near-Duplicate Video Clip Detection and Retrieval: An Accurate and Fast System , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[16]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[17]  Fei Wang,et al.  Real-time large scale near-duplicate web video retrieval , 2010, ACM Multimedia.

[18]  Olivier Buisson,et al.  Scaling content-based video copy detection to very large databases , 2009, Multimedia Tools and Applications.

[19]  Hervé Glotin,et al.  Web image retrieval on ImagEVAL: evidences on visualness and textualness concept dependency in fusion model , 2007, CIVR '07.

[20]  Athman Bouguettaya,et al.  An Efficient Near-Duplicate Video Shot Detection Method Using Shot-Based Interest Points , 2009, IEEE Transactions on Multimedia.

[21]  Jun Wang,et al.  Self-taught hashing for fast similarity search , 2010, SIGIR.

[22]  Qingming Huang,et al.  Near-duplicate video matching with transformation recognition , 2009, MM '09.

[23]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[24]  Yi Yang,et al.  Ranking with local regression and global alignment for cross media retrieval , 2009, ACM Multimedia.

[25]  Gert R. G. Lanckriet,et al.  Combining audio content and social context for semantic music discovery , 2009, SIGIR.

[26]  Hwann-Tzong Chen,et al.  Semantic manifold learning for image retrieval , 2005, ACM Multimedia.

[27]  Parham Aarabi,et al.  Tiny Videos: A Large Data Set for Nonparametric Video Retrieval and Frame Classification , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Chong-Wah Ngo,et al.  Practical elimination of near-duplicates from web video search , 2007, ACM Multimedia.

[29]  Zi Huang,et al.  UQLIPS: A Real-time Near-duplicate Video Clip Detection System , 2007, VLDB.

[30]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.

[31]  Yi Yang,et al.  Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval , 2008, IEEE Transactions on Multimedia.

[32]  Avideh Zakhor,et al.  Fast similarity search and clustering of video sequences on the world-wide-web , 2005, IEEE Transactions on Multimedia.

[33]  Chong-Wah Ngo,et al.  Near-duplicate keyframe retrieval with visual keywords and semantic context , 2007, CIVR '07.

[34]  Shin'ichi Satoh,et al.  Detecting Screen Shot Images within Large-Scale Video Archive , 2010, 2010 20th International Conference on Pattern Recognition.