An Integrated Signature-Based Framework for Efficient Visual Similarity Detection and Measurement in Video Shots

This article presents a framework for speedy video matching and retrieval through detection and measurement of visual similarity. The framework’s efficiency stems from its power to encode a given shot content into a compact fixed-length signature that helps in robust real-time matching. Separate scene and motion signatures are developed and fused together to fully represent and match respective video shots. Scene information is captured through the Statistical Dominant Color Profile (SDCP), while motion information is captured through a graph-based signature called the Dominant Color Graph Profile (DCGP). The SDCP is a fixed-length compact signature that statistically encodes the colors’ spatiotemporal patterns across video frames. The DCGP is a fixed-length signature that records and tracks the gray levels across subsampled video frames, where the graph structural properties are used to extract the signature values. Finally, the overall video signature is generated by fusing the individual scene and motion signatures. The signature-based aspect of the proposed framework is the key to its high matching speed (> 2000 fps) compared to current techniques that rely on exhaustive processing. To maximize the benefit of the framework, compressed-domain videos are utilized as a case study following their wide availability. However, the framework avoids full video decompression and operates on tiny frames rather than full-size decompressed frames. Experiments on various standard and challenging dataset groups show the framework’s robust performance in terms of both retrieval and computational performance.

[1]  Amr Ahmed,et al.  Compact Signature-Based Compressed Video Matching Using Dominant Color Profiles (DCP) , 2014, 2014 22nd International Conference on Pattern Recognition.

[2]  Terumasa Aoki,et al.  Motion dense sampling and component clustering for action recognition , 2014, Multimedia Tools and Applications.

[3]  Hussein M. Abdel-Wahab,et al.  A human-based technique for measuring video data similarity , 2003, Proceedings of the Eighth IEEE Symposium on Computers and Communications. ISCC 2003.

[4]  Yueting Zhuang,et al.  A new approach to retrieve video by example video clip , 1999, MULTIMEDIA '99.

[5]  Andrew Hunter,et al.  Compressed video matching: Frame-to-frame revisited , 2016, Multimedia Tools and Applications.

[6]  Han-ping Gao,et al.  Content Based Video Retrieval Using Spatiotemporal Salient Objects , 2010, 2010 International Symposium on Intelligence Information Processing and Trusted Computing.

[7]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  乔宇 Exploring Motion Boundary based Sampling and Spatial-Temporal Context Descriptors for Action Recognition , 2013 .

[9]  Siva Kumar Avula,et al.  Frame based Video Retrieval using Video Signatures , 2012 .

[10]  Zheng Cao,et al.  An Efficient Method for Video Similarity Search with Video Signature , 2010, 2010 International Conference on Computational and Information Sciences.

[11]  Jurandy Almeida,et al.  Comparison of video sequences with histograms of motion patterns , 2011, 2011 18th IEEE International Conference on Image Processing.

[12]  Béatrice Cochener,et al.  Content-Based Medical Video Retrieval Based on Region Motion Trajectories , 2011 .

[13]  B. S. Manjunath,et al.  An efficient color representation for image retrieval , 2001, IEEE Trans. Image Process..

[14]  Hong Shao,et al.  Image Retrieval Based on MPEG-7 Dominant Color Descriptor , 2008, 2008 The 9th International Conference for Young Computer Scientists.

[15]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Jacek M. Zurada,et al.  Efficiency and Scalability Methods for Computational Intellect , 2013 .

[18]  Jianying Hu,et al.  Extraction of perceptually important colors and similarity measurement for image matching, retrieval and analysis , 2002, IEEE Trans. Image Process..

[19]  Vince Paul,et al.  Video Copy Detection Using F-Sift and Graph Based Video Sequence Matching , 2014 .

[20]  Zhenyang Wu,et al.  Realistic human action recognition by Fast HOG3D and self-organization feature map , 2014, Machine Vision and Applications.

[21]  Aggelos K. Katsaggelos,et al.  A robust and lightweight feature system for video fingerprinting , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[22]  Mubarak Shah,et al.  High-level event recognition in unconstrained videos , 2013, International Journal of Multimedia Information Retrieval.

[23]  Shangbo Zhou,et al.  A compressed sensing approach for query by example video retrieval , 2013, Multimedia Tools and Applications.

[24]  Simon J. Thorpe Spike-Based Image Processing: Can We Reproduce Biological Vision in Hardware? , 2012, ECCV Workshops.

[25]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Mohamed Hefeeda,et al.  Spatio-temporal video copy detection , 2012, MMSys '12.

[27]  Zi Huang,et al.  Extracting representative motion flows for effective video retrieval , 2011, Multimedia Tools and Applications.

[28]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[29]  Bhiksha Raj,et al.  Beyond Gaussian Pyramid: Multi-skip Feature Stacking for action recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[31]  Feng Shi,et al.  Sampling Strategies for Real-Time Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Edward J. Delp,et al.  An HEVC compressed domain content-based video signature for copy detection and video retrieval , 2014, Electronic Imaging.

[33]  Muzammil H Mohammed,et al.  Content based Video Retrieval Systems - Methods, Techniques, Trends and Challenges , 2015 .

[34]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[35]  Tinne Tuytelaars,et al.  Dense interest points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Christian Wolf,et al.  Fast Exact Hyper-graph Matching with Dynamic Programming for Spatio-temporal Data , 2014, Journal of Mathematical Imaging and Vision.

[37]  Marcello Pelillo,et al.  Replicator Equations, Maximal Cliques, and Graph Isomorphism , 1998, Neural Computation.

[38]  Mubarak Shah,et al.  Content based video matching using spatiotemporal volumes , 2008, Comput. Vis. Image Underst..

[39]  Andrew Hunter,et al.  Video matching using DC-image and local features , 2013 .

[40]  Paween Khoenkaw,et al.  Video similarity measurement using spectrogram , 2014, 2014 International Computer Science and Engineering Conference (ICSEC).

[41]  John R. Kender,et al.  Fast Near-Duplicate Video Retrieval via Motion Time Series Matching , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[42]  F ATTNEAVE,et al.  Dimensions of similarity. , 1950, The American journal of psychology.

[43]  Liang-Hua Chen,et al.  Integration of Color and Motion Features for Video Retrieval , 2009, Int. J. Pattern Recognit. Artif. Intell..

[44]  Yueting Zhuang,et al.  Content-based video similarity model , 2000, ACM Multimedia.

[45]  Tinne Tuytelaars,et al.  Dense interest features for video processing , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[46]  Cees Snoek,et al.  VideoStory: A New Multimedia Embedding for Few-Example Recognition and Translation of Events , 2014, ACM Multimedia.

[47]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[48]  Weisi Lin,et al.  A Video Saliency Detection Model in Compressed Domain , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[49]  Hong Liu,et al.  A Segmentation and Graph-Based Video Sequence Matching Method for Video Copy Detection , 2013, IEEE Transactions on Knowledge and Data Engineering.

[50]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Carlos Medrano,et al.  A Time Flexible Kernel framework for video-based activity recognition , 2016, Image Vis. Comput..

[52]  Chiranjoy Chattopadhyay,et al.  Use of trajectory and spatiotemporal features for retrieval of videos with a prominent moving foreground object , 2016, Signal Image Video Process..

[53]  I. King,et al.  Video Comparison Using Tree Matching Algorithms , 2001 .

[54]  Sandra Lach Arlinghaus,et al.  Practical Handbook of Curve Fitting , 1994 .

[55]  Wolfgang Effelsberg,et al.  VisualGREP: a systematic method to compare and retrieve video sequences , 1997, Electronic Imaging.

[56]  D. Jeong,et al.  A Frame‐Based Video Signature Method for Very Quick Video Identification and Location , 2013 .

[57]  M. Bar,et al.  Scene Vision: Making Sense of What We See , 2014 .

[58]  Ankur P. Parikh,et al.  Algorithms for Graph Similarity and Subgraph Matching , 2011 .

[59]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[60]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[61]  Suh-Yin Lee,et al.  Content-based video retrieval based on similarity of frame sequence , 1998, Proceedings International Workshop on Multi-Media Database Management Systems (Cat. No.98TB100249).

[62]  Stevan Rudinac,et al.  Leveraging visual concepts and query performance prediction for semantic-theme-based video retrieval , 2012, International Journal of Multimedia Information Retrieval.

[63]  Narendra Kumar Kamila Handbook of Research on Emerging Perspectives in Intelligent Pattern Recognition, Analysis, and Image Processing , 2015 .

[64]  Loong Fah Cheong,et al.  Activity recognition using dense long-duration trajectories , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[65]  Utkarsha S. Pacharaney,et al.  Dimensionality reduction for fast and accurate video search and retrieval in a large scale database , 2013, 2013 Nirma University International Conference on Engineering (NUiCONE).

[66]  Jiebo Luo,et al.  Recognizing realistic actions from videos , 2009, CVPR.

[67]  R. Venkatesh Babu,et al.  H.264 compressed video classification using Histogram of Oriented Motion Vectors (HOMV) , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[68]  Tal Hassner,et al.  Motion Interchange Patterns for Action Recognition in Unconstrained Videos , 2012, ECCV.

[69]  Limin Wang,et al.  Motionlets: Mid-level 3D Parts for Human Motion Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[70]  Amr Ahmed,et al.  A framework for automatic semantic video annotation , 2014, Multimedia Tools and Applications.

[71]  Thomas Seidl,et al.  FELICITY: A Flexible Video Similarity Search Framework Using the Earth Mover's Distance , 2015, SISAP.

[72]  Won Jong Jeon,et al.  A spatio-temporal pyramid matching for video retrieval , 2013, Comput. Vis. Image Underst..

[73]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[74]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[75]  Ivan Laptev,et al.  Efficient Feature Extraction, Encoding, and Classification for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[76]  Jialie Shen,et al.  Personalized video similarity measure , 2011, Multimedia Systems.

[77]  Matej Kristan,et al.  Histograms of optical flow for efficient representation of body motion , 2010, Pattern Recognit. Lett..

[78]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[79]  Mubarak Shah,et al.  Classifying web videos using a global video descriptor , 2013, Machine Vision and Applications.

[80]  James Parker,et al.  on Knowledge and Data Engineering, , 1990 .

[81]  Nicu Sebe,et al.  Video classification with Densely extracted HOG/HOF/MBH features: an evaluation of the accuracy/computational efficiency trade-off , 2015, International Journal of Multimedia Information Retrieval.

[82]  Charles A. Bouman,et al.  Perceptual image similarity experiments , 1998, Electronic Imaging.

[83]  R. Venkatesh Babu,et al.  Rapid human action recognition in H.264/AVC compressed domain for video surveillance , 2013, 2013 Visual Communications and Image Processing (VCIP).

[84]  Alan J. Miller,et al.  Numerical Methods of Curve Fitting. , 1961 .

[85]  S. Butler Eigenvalues and structures of graphs , 2008 .

[86]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[87]  Thomas Seidl,et al.  Large-scale Efficient and Effective Video Similarity Search , 2015, LSDS-IR@CIKM.