Video similarity detection using fixed-length Statistical Dominant Colour Profile (SDCP) signatures

This paper presents a fast and effective technique for videos’ visual similarity detection and measurement using compact fixed-length signatures. The proposed technique facilitates for building real-time and scalable video matching/retrieval systems through generating a representative signature for a given video shot. The generated signature (Statistical Dominant Colour Profile, SDCP) effectively encodes the colours’ spatio-temporal patterns in a given shot, towards a robust real-time matching. Furthermore, the SDCP signature is engineered to better address the visual similarity problem, through its relaxed representation of shot contents. The compact fixed-length aspect of the proposed signature is the key to its high matching speed (>1000 fps) compared to the current techniques that relies on exhaustive processing, such as dense trajectories. The SDCP signature encodes a given video shot with only 294 values, regardless of the shot length, which facilitates for speedy signature extraction and matching. To maximize the benefit of the proposed technique, compressed-domain videos are utilized as a case study following their wide availability. However, the proposed technique avoids full video decompression and operates on tiny frames, rather than full-size decompressed frames. This is achievable through using the tiny DC-images sequence of the MPEG compressed stream. The experiments on various standard and challenging datasets (e.g. UCF101 13k videos) shows the technique’s robust performance, in terms of both, retrieval ability and computational performances.

[1]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[3]  Shih-Fu Chang,et al.  Algorithms and system for segmentation and structure analysis in soccer video , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[4]  Aggelos K. Katsaggelos,et al.  A robust and lightweight feature system for video fingerprinting , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[5]  Mubarak Shah,et al.  High-level event recognition in unconstrained videos , 2013, International Journal of Multimedia Information Retrieval.

[6]  Forouzan Golshani,et al.  Motion recovery for video content classification , 1995, TOIS.

[7]  K. Wakimoto,et al.  Efficient and Effective Querying by Image Content , 1994 .

[8]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[9]  Siva Kumar Avula,et al.  Frame based Video Retrieval using Video Signatures , 2012 .

[10]  David S. Doermann,et al.  Video retrieval using spatio-temporal descriptors , 2003, MULTIMEDIA '03.

[11]  Zi Huang,et al.  Extracting representative motion flows for effective video retrieval , 2011, Multimedia Tools and Applications.

[12]  Milind R. Naphade,et al.  Novel scheme for fast and efficent video sequence matching using compact signatures , 1999, Electronic Imaging.

[13]  Aliaa A. A. Youssif,et al.  Hybrid-Based Compressed Domain Video Fingerprinting Technique , 2012, Comput. Inf. Sci..

[14]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Tobi Delbrück,et al.  A 128$\times$ 128 120 dB 15 $\mu$s Latency Asynchronous Temporal Contrast Vision Sensor , 2008, IEEE Journal of Solid-State Circuits.

[16]  Rakesh Mohan,et al.  Video sequence matching , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[17]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Narendra Kumar Kamila Handbook of Research on Emerging Perspectives in Intelligent Pattern Recognition, Analysis, and Image Processing , 2015 .

[19]  Zhenyang Wu,et al.  Realistic human action recognition by Fast HOG3D and self-organization feature map , 2014, Machine Vision and Applications.

[20]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[21]  Stephen W. Smoliar,et al.  Developing power tools for video indexing and retrieval , 1994, Electronic Imaging.

[22]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Amr Ahmed,et al.  Compact Signature-Based Compressed Video Matching Using Dominant Color Profiles (DCP) , 2014, 2014 22nd International Conference on Pattern Recognition.

[24]  Haim Levkowitz,et al.  Introduction to information retrieval (IR) , 2008 .

[25]  Yueting Zhuang,et al.  A new approach to retrieve video by example video clip , 1999, MULTIMEDIA '99.

[26]  Ahmed K. Elmagarmid,et al.  InsightVideo: toward hierarchical video content organization for efficient browsing, summarization and retrieval , 2005, IEEE Transactions on Multimedia.

[27]  Peter Lambert,et al.  Moving object detection in the H.264/AVC compressed domain for video surveillance applications , 2009, J. Vis. Commun. Image Represent..

[28]  Yu Qiao,et al.  Exploring Motion Boundary based Sampling and Spatial-Temporal Context Descriptors for Action Recognition , 2013, BMVC.

[29]  Andrew Hunter,et al.  Compressed video matching: Frame-to-frame revisited , 2016, Multimedia Tools and Applications.

[30]  T. Delbruck,et al.  > Replace This Line with Your Paper Identification Number (double-click Here to Edit) < 1 , 2022 .

[31]  Ming-Ting Sun,et al.  Automatic video activity detection using compressed domain motion trajectories for H.264 videos , 2011, J. Vis. Commun. Image Represent..

[32]  Han-ping Gao,et al.  Content Based Video Retrieval Using Spatiotemporal Salient Objects , 2010, 2010 International Symposium on Intelligence Information Processing and Trusted Computing.

[33]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[34]  Mubarak Shah,et al.  Classifying web videos using a global video descriptor , 2013, Machine Vision and Applications.

[35]  Béatrice Cochener,et al.  Content-Based Medical Video Retrieval Based on Region Motion Trajectories , 2011 .

[36]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Nicu Sebe,et al.  Video classification with Densely extracted HOG/HOF/MBH features: an evaluation of the accuracy/computational efficiency trade-off , 2015, International Journal of Multimedia Information Retrieval.

[38]  Marcel Worring,et al.  Adding Semantics to Detectors for Video Retrieval , 2007, IEEE Transactions on Multimedia.

[39]  Loong Fah Cheong,et al.  Activity recognition using dense long-duration trajectories , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[40]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[41]  Feng Shi,et al.  Sampling Strategies for Real-Time Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Utkarsha S. Pacharaney,et al.  Dimensionality reduction for fast and accurate video search and retrieval in a large scale database , 2013, 2013 Nirma University International Conference on Engineering (NUiCONE).

[43]  Qi Tian,et al.  An Efficient Sequential Approach to Tracking Multiple Objects Through Crowds for Real-Time Intelligent CCTV Systems , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[44]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[45]  Amr Ahmed,et al.  A framework for automatic semantic video annotation , 2014, Multimedia Tools and Applications.

[46]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[47]  Chong-Wah Ngo,et al.  Integrating color and spatial features for content-based video retrieval , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[48]  Mubarak Shah,et al.  Content based video matching using spatiotemporal volumes , 2008, Comput. Vis. Image Underst..

[49]  Andrew Hunter,et al.  Video matching using DC-image and local features , 2013 .

[50]  F ATTNEAVE,et al.  Dimensions of similarity. , 1950, The American journal of psychology.

[51]  Liang-Hua Chen,et al.  Integration of Color and Motion Features for Video Retrieval , 2009, Int. J. Pattern Recognit. Artif. Intell..

[52]  Limin Wang,et al.  Motionlets: Mid-level 3D Parts for Human Motion Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  B. B. Meshram,et al.  Content based video retrieval systems , 2012, ArXiv.

[54]  Alan J. Miller,et al.  Numerical Methods of Curve Fitting. , 1961 .

[55]  Andrew B. Watson,et al.  Image Compression Using the Discrete Cosine Transform , 1994 .

[56]  Charles A. Bouman,et al.  Perceptual image similarity experiments , 1998, Electronic Imaging.

[57]  Moncef Gabbouj,et al.  Dominant Color Extraction Based on Dynamic Clustering by Multi-dimensional Particle Swarm Optimization , 2009, 2009 Seventh International Workshop on Content-Based Multimedia Indexing.

[58]  Zhijie Zhang,et al.  Compressed video copy detection based on edge analysis , 2010, The 2010 IEEE International Conference on Information and Automation.

[59]  Avideh Zakhor,et al.  Efficient video similarity measurement with video signature , 2002, Proceedings. International Conference on Image Processing.

[60]  B. S. Manjunath,et al.  An efficient color representation for image retrieval , 2001, IEEE Trans. Image Process..

[61]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[62]  Zheng Cao,et al.  An Efficient Method for Video Similarity Search with Video Signature , 2010, 2010 International Conference on Computational and Information Sciences.

[63]  Jurandy Almeida,et al.  Comparison of video sequences with histograms of motion patterns , 2011, 2011 18th IEEE International Conference on Image Processing.

[64]  Sudeep D. Thepade,et al.  Assessment of similarity measurement criteria in Thepade's Sorted Ternary Block Truncation Coding (TSTBTC) for Content Based Video Retrieval , 2015, 2015 International Conference on Communication, Information & Computing Technology (ICCICT).

[65]  Tal Hassner,et al.  Motion Interchange Patterns for Action Recognition in Unconstrained Videos , 2012, ECCV.

[66]  Françoise J. Prêteux,et al.  Trajectory signature for action recognition in video , 2012, ACM Multimedia.

[67]  Chiranjoy Chattopadhyay,et al.  Use of trajectory and spatiotemporal features for retrieval of videos with a prominent moving foreground object , 2016, Signal Image Video Process..

[68]  Sandra Lach Arlinghaus,et al.  Practical Handbook of Curve Fitting , 1994 .

[69]  Wolfgang Effelsberg,et al.  VisualGREP: a systematic method to compare and retrieve video sequences , 1997, Electronic Imaging.

[70]  HongJiang Zhang,et al.  Automatic video scene extraction by shot grouping , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[71]  Fang Yuan,et al.  Compressed video copy detection based on texture analysis , 2010, 2010 IEEE International Conference on Wireless Communications, Networking and Information Security.

[72]  Ivan Laptev,et al.  Efficient Feature Extraction, Encoding, and Classification for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[73]  M Patil Title of Research Paper " Dominant Color Based Extraction of Key Frames for Sports Video Summarization, " , .

[74]  Kuo-Chin Fan,et al.  Motion Flow-Based Video Retrieval , 2007, IEEE Transactions on Multimedia.

[75]  Weisi Lin,et al.  A Video Saliency Detection Model in Compressed Domain , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[76]  Terumasa Aoki,et al.  Motion dense sampling and component clustering for action recognition , 2014, Multimedia Tools and Applications.

[77]  Hussein M. Abdel-Wahab,et al.  A human-based technique for measuring video data similarity , 2003, Proceedings of the Eighth IEEE Symposium on Computers and Communications. ISCC 2003.

[78]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[79]  Pradip Panchal,et al.  Performance evaluation of fade and dissolve transition shot boundary detection in presence of motion in video , 2012, 2012 1st International Conference on Emerging Technology Trends in Electronics, Communication & Networking.

[80]  A. Abbass,et al.  Compressed Domain Video Fingerprinting Technique Using The Singular Value Decomposition , 2012 .

[81]  Jianying Hu,et al.  Extraction of perceptually important colors and similarity measurement for image matching, retrieval and analysis , 2002, IEEE Trans. Image Process..

[82]  Yueting Zhuang,et al.  Content-based video similarity model , 2000, MM 2000.

[83]  Hong Shao,et al.  Image Retrieval Based on MPEG-7 Dominant Color Descriptor , 2008, 2008 The 9th International Conference for Young Computer Scientists.

[84]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[85]  B. S. Manjunath,et al.  Introduction to MPEG-7: Multimedia Content Description Interface , 2002 .

[86]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).