Fast distributed video deduplication via locality-sensitive hashing with similarity ranking

The exponentially growing amount of video data being produced has led to tremendous challenges for video deduplication technology. Nowadays, many different deduplication approaches are being rapidly developed, but they are generally slow and their identification processes are somewhat inaccurate. Till now, there is rare work that studies the generic hash-based distributed framework and the efficient similarity ranking strategy for video deduplication. This paper proposes a flexible and fast distributed video deduplication framework based on hash codes. It is able to support the hash table indexing using any existing hashing algorithm in a distributed environment and can efficiently rank the candidate videos by exploring the similarities among the key frames over multiple tables using MapReduce strategy. Our experiments with a popular large-scale dataset demonstrate that the proposed framework can achieve satisfactory video deduplication performance.

[1]  Chong-Wah Ngo,et al.  Practical elimination of near-duplicates from web video search , 2007, ACM Multimedia.

[2]  Xu Han,et al.  Complementary Binary Quantization for Joint Multiple Indexing , 2018, IJCAI.

[3]  David J. Fleet,et al.  Fast search in Hamming space with multi-index hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Yeguang Li,et al.  Fast Video Deduplication via Locality Sensitive Hashing with Similarity Ranking , 2016, ICIMCS.

[5]  Shih-Fu Chang,et al.  Submodular video hashing: a unified framework towards video pooling and indexing , 2012, ACM Multimedia.

[6]  Shih-Fu Chang,et al.  Hash Bit Selection for Nearest Neighbor Search , 2017, IEEE Transactions on Image Processing.

[7]  Wei Liu,et al.  Discrete Graph Hashing , 2014, NIPS.

[8]  David J. Fleet,et al.  Cartesian K-Means , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Laurent Amsaleg,et al.  Indexing and searching 100M images with map-reduce , 2013, ICMR.

[11]  Heng Tao Shen,et al.  Deep Region Hashing for Generic Instance Search from Images , 2018, AAAI.

[12]  Shih-Fu Chang,et al.  Mixed image-keyword query adaptive hashing over multilabel images , 2014, TOMCCAP.

[13]  Fei Wang,et al.  Million-scale near-duplicate video retrieval system , 2011, ACM Multimedia.

[14]  Meng Wang,et al.  Self-Supervised Video Hashing With Hierarchical Binary Auto-Encoder , 2018, IEEE Transactions on Image Processing.

[15]  Shih-Fu Chang,et al.  Mobile product search with Bag of Hash Bits and boundary reranking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Heng Tao Shen,et al.  Deep Region Hashing for Efficient Large-scale Instance Search from Images , 2017, ArXiv.

[17]  Junfeng He,et al.  Optimal Parameters for Locality-Sensitive Hashing , 2012, Proceedings of the IEEE.

[18]  Shuicheng Yan,et al.  Non-Metric Locality-Sensitive Hashing , 2010, AAAI.

[19]  Dong Liu,et al.  Large-Scale Video Hashing via Structure Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Xinbo Gao,et al.  Triplet-Based Deep Hashing Network for Cross-Modal Retrieval , 2018, IEEE Transactions on Image Processing.

[21]  Xianglong Liu,et al.  Adaptive multi-bit quantization for hashing , 2015, Neurocomputing.

[22]  Rahul Rawat,et al.  Bucket based data deduplication technique for big data storage system , 2016, 2016 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO).

[23]  Wei Liu,et al.  Pairwise Relationship Guided Deep Hashing for Cross-Modal Retrieval , 2017, AAAI.

[24]  Kotagiri Ramamohanarao,et al.  Random Angular Projection for Fast Nearest Subspace Search , 2018, PCM.

[25]  Yadong Mu,et al.  Boosting Temporal Binary Coding for Large-Scale Video Search , 2017, IEEE Transactions on Multimedia.

[26]  Meng Wang,et al.  Play and Rewind: Optimizing Binary Representations of Videos by Self-Supervised Temporal Hashing , 2016, ACM Multimedia.

[27]  Meng Wang,et al.  Enhancing news organization for convenient retrieval and browsing , 2013, ACM Trans. Multim. Comput. Commun. Appl..

[28]  Xuelong Li,et al.  Query-Adaptive Reciprocal Hash Tables for Nearest Neighbor Search , 2016, IEEE Transactions on Image Processing.

[29]  Xianglong Liu,et al.  Collaborative Hashing , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[31]  Sanjiv Kumar,et al.  On the Difficulty of Nearest Neighbor Search , 2012, ICML.

[32]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[33]  Lei Huang,et al.  Query-Adaptive Hash Code Ranking for Large-Scale Multi-View Visual Search , 2016, IEEE Transactions on Image Processing.

[34]  Hanqing Lu,et al.  Fast and Accurate Image Matching with Cascade Hashing for 3D Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[36]  Yadong Mu,et al.  Large-scale multi-task image labeling with adaptive relevance discovery and feature hashing , 2015, Signal Process..

[37]  Shih-Fu Chang,et al.  Spherical hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Jian Sun,et al.  K-Means Hashing: An Affinity-Preserving Quantization Method for Learning Binary Compact Codes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Sanjiv Kumar,et al.  Learning Binary Codes for High-Dimensional Data Using Bilinear Projections , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  IEEE conference on computer vision and pattern recognition , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[41]  Zi Huang,et al.  Multiple feature hashing for real-time large scale near-duplicate video retrieval , 2011, ACM Multimedia.

[42]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[43]  Xianglong Liu,et al.  Distributed Adaptive Binary Quantization for Fast Nearest Neighbor Search , 2017, IEEE Transactions on Image Processing.