Video Similarity Measurement and Search

The quantity of digital videos is huge, due to technological advances in video capture, storage and compression. However, the usefulness of these enormous volumes is limited by the effectiveness of content-based video retrieval systems (CBVR). Video matching for the retrieval purpose is the core of these CBVR systems where videos are matched based on their respective visual features and their evolvement across video frames. Also, it acts as an essential foundational layer to infer semantic similarity at advanced stage, in collaboration with metadata. This chapter presents and discusses the core field concepts, problems and recent trends. This will provide the reader with the required amount of knowledge to select suitable features’ set and adequate techniques to develop robust research in this field.

[1]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[2]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[3]  Andrew B. Watson,et al.  Image Compression Using the Discrete Cosine Transform , 1994 .

[4]  Forouzan Golshani,et al.  Motion recovery for video content classification , 1995, TOIS.

[5]  Edoardo Ardizzone,et al.  Video indexing using optical flow field , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[6]  Wolfgang Effelsberg,et al.  VisualGREP: a systematic method to compare and retrieve video sequences , 1997, Electronic Imaging.

[7]  Rakesh Mohan,et al.  Video sequence matching , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8]  Suh-Yin Lee,et al.  Content-based video retrieval based on similarity of frame sequence , 1998, Proceedings International Workshop on Multi-Media Database Management Systems (Cat. No.98TB100249).

[9]  Yueting Zhuang,et al.  A new approach to retrieve video by example video clip , 1999, MULTIMEDIA '99.

[10]  HongJiang Zhang,et al.  Automatic video scene extraction by shot grouping , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[11]  Chung-Lin Huang,et al.  A robust scene-change detection method for video segmentation , 2001, IEEE Trans. Circuits Syst. Video Technol..

[12]  James J. Little,et al.  Video retrieval by spatial and temporal structure of trajectories , 2001, IS&T/SPIE Electronic Imaging.

[13]  Avideh Zakhor,et al.  Efficient video similarity measurement with video signature , 2003, IEEE Trans. Circuits Syst. Video Technol..

[14]  Hussein M. Abdel-Wahab,et al.  A human-based technique for measuring video data similarity , 2003, Proceedings of the Eighth IEEE Symposium on Computers and Communications. ISCC 2003.

[15]  David S. Doermann,et al.  Video retrieval using spatio-temporal descriptors , 2003, MULTIMEDIA '03.

[16]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[17]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[18]  Xiaofang Zhou,et al.  Video matching using binary signature , 2005, 2005 International Symposium on Intelligent Signal Processing and Communication Systems.

[19]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[20]  Ramesh R. Sarukkai,et al.  Video search: opportunities & challenges , 2005, MIR '05.

[21]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Shehzad Khalid,et al.  Motion Trajectory Clustering for Video Retrieval Using Spatio-temporal Approximations , 2005, VISUAL.

[23]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[24]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[26]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[27]  R. Venkatesh Babu,et al.  Compressed domain video retrieval using object and global motion descriptors , 2006, Multimedia Tools and Applications.

[28]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[29]  Aly A. Farag,et al.  CSIFT: A SIFT Descriptor with Color Invariant Characteristics , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  Kuo-Chin Fan,et al.  Motion Flow-Based Video Retrieval , 2007, IEEE Transactions on Multimedia.

[31]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[32]  Qi Tian,et al.  An Efficient Sequential Approach to Tracking Multiple Objects Through Crowds for Real-Time Intelligent CCTV Systems , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[33]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[35]  Fred Stentiford,et al.  Video sequence matching based on temporal ordinal measurement , 2008, Pattern Recognit. Lett..

[36]  Challenges and techniques for effective and efficient similarity search in large video databases , 2008, Proc. VLDB Endow..

[37]  Vasumathi Narayanan,et al.  A Survey of Content-Based Video Retrieval , 2008 .

[38]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[39]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[40]  Mubarak Shah,et al.  Content based video matching using spatiotemporal volumes , 2008, Comput. Vis. Image Underst..

[41]  S. Shankar Sastry,et al.  High-Speed Action Recognition and Localization in Compressed Domain Videos , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[42]  Liang-Hua Chen,et al.  Integration of Color and Motion Features for Video Retrieval , 2009, Int. J. Pattern Recognit. Artif. Intell..

[43]  Hung-Khoon Tan,et al.  Real-Time Near-Duplicate Elimination for Web Video Search With Content and Context , 2009, IEEE Transactions on Multimedia.

[44]  Peter Lambert,et al.  Moving object detection in the H.264/AVC compressed domain for video surveillance applications , 2009, J. Vis. Commun. Image Represent..

[45]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[46]  Zhijie Zhang,et al.  Compressed video copy detection based on edge analysis , 2010, The 2010 IEEE International Conference on Information and Automation.

[47]  Loong Fah Cheong,et al.  Activity recognition using dense long-duration trajectories , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[48]  Fang Yuan,et al.  Compressed video copy detection based on texture analysis , 2010, 2010 IEEE International Conference on Wireless Communications, Networking and Information Security.

[49]  Markus Koch,et al.  Learning automatic concept detectors from online video , 2010, Comput. Vis. Image Underst..

[50]  Jialie Shen,et al.  Personalized video similarity measure , 2011, Multimedia Systems.

[51]  Gang Hua,et al.  IBM Research TRECVID-2010 Video Copy Detection and Multimedia Event Detection System , 2010, TRECVID.

[52]  Zheng Cao,et al.  An Efficient Method for Video Similarity Search with Video Signature , 2010, 2010 International Conference on Computational and Information Sciences.

[53]  Mohammed Ghanbari,et al.  Compressed domain content based retrieval using H.264 DC-pictures , 2010, Multimedia Tools and Applications.

[54]  Matej Kristan,et al.  Histograms of optical flow for efficient representation of body motion , 2010, Pattern Recognit. Lett..

[55]  Han-ping Gao,et al.  Content Based Video Retrieval Using Spatiotemporal Salient Objects , 2010, 2010 International Symposium on Intelligence Information Processing and Trusted Computing.

[56]  Dacheng Tao,et al.  Video Search and Mining , 2010, Video Search and Mining.

[57]  Parham Aarabi,et al.  Tiny Videos: A Large Data Set for Nonparametric Video Retrieval and Frame Classification , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Benjamin Bustos,et al.  Harris 3D: a robust extension of the Harris operator for interest point detection on 3D meshes , 2011, The Visual Computer.

[59]  Ming-Ting Sun,et al.  Automatic video activity detection using compressed domain motion trajectories for H.264 videos , 2011, J. Vis. Commun. Image Represent..

[60]  Jurandy Almeida,et al.  Comparison of video sequences with histograms of motion patterns , 2011, 2011 18th IEEE International Conference on Image Processing.

[61]  Zi Huang,et al.  Extracting representative motion flows for effective video retrieval , 2011, Multimedia Tools and Applications.

[62]  Florent Perronnin,et al.  High-dimensional signature compression for large-scale image classification , 2011, CVPR 2011.

[63]  Béatrice Cochener,et al.  Content-Based Medical Video Retrieval Based on Region Motion Trajectories , 2011 .

[64]  B. B. Meshram,et al.  Content based video retrieval systems , 2012, ArXiv.

[65]  John R. Kender,et al.  Fast Near-Duplicate Video Retrieval via Motion Time Series Matching , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[66]  Andrea Torsello,et al.  A stable graph-based representation for object recognition through high-order matching , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[67]  Mubarak Shah,et al.  Classifying web videos using a global video descriptor , 2013, Machine Vision and Applications.

[68]  Mohamed Hefeeda,et al.  Spatio-temporal video copy detection , 2012, MMSys '12.

[69]  Tal Hassner,et al.  Motion Interchange Patterns for Action Recognition in Unconstrained Videos , 2012, ECCV.

[70]  Mubarak Shah,et al.  High-level event recognition in unconstrained videos , 2013, International Journal of Multimedia Information Retrieval.

[71]  Siva Kumar Avula,et al.  Frame based Video Retrieval using Video Signatures , 2012 .

[72]  Françoise J. Prêteux,et al.  Trajectory signature for action recognition in video , 2012, ACM Multimedia.

[73]  Pradip Panchal,et al.  Performance evaluation of fade and dissolve transition shot boundary detection in presence of motion in video , 2012, 2012 1st International Conference on Emerging Technology Trends in Electronics, Communication & Networking.

[74]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[75]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[76]  Aliaa A. A. Youssif,et al.  Hybrid-Based Compressed Domain Video Fingerprinting Technique , 2012, Comput. Inf. Sci..

[77]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[78]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[79]  Farzad Zargari,et al.  An efficient compressed domain video indexing method , 2013, Multimedia Tools and Applications.

[80]  Amr Ahmed,et al.  A framework for automatic semantic video annotation , 2014, Multimedia Tools and Applications.

[81]  Won Jong Jeon,et al.  A spatio-temporal pyramid matching for video retrieval , 2013, Comput. Vis. Image Underst..

[82]  Limin Wang,et al.  Motionlets: Mid-level 3D Parts for Human Motion Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[83]  Yu Qiao,et al.  Exploring Motion Boundary based Sampling and Spatial-Temporal Context Descriptors for Action Recognition , 2013, BMVC.

[84]  R. Venkatesh Babu,et al.  H.264 compressed video classification using Histogram of Oriented Motion Vectors (HOMV) , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[85]  Andrew Hunter,et al.  Video matching using DC-image and local features , 2013 .

[86]  D. Jeong,et al.  A Frame‐Based Video Signature Method for Very Quick Video Identification and Location , 2013 .

[87]  Feng Shi,et al.  Sampling Strategies for Real-Time Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[88]  Hong Liu,et al.  A Segmentation and Graph-Based Video Sequence Matching Method for Video Copy Detection , 2013, IEEE Transactions on Knowledge and Data Engineering.

[89]  Tal Hassner,et al.  A Critical Review of Action Recognition Benchmarks , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[90]  Jacek M. Zurada,et al.  Efficiency and Scalability Methods for Computational Intellect , 2013 .

[91]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[92]  Utkarsha S. Pacharaney,et al.  Dimensionality reduction for fast and accurate video search and retrieval in a large scale database , 2013, 2013 Nirma University International Conference on Engineering (NUiCONE).

[93]  Rudinei Goularte,et al.  Video shot representation based on histograms , 2013, SAC '13.

[94]  Enrico Magli,et al.  Shot-based object retrieval from video with compressed Fisher Vectors , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[95]  Christian Wolf,et al.  Fast Exact Hyper-graph Matching with Dynamic Programming for Spatio-temporal Data , 2014, Journal of Mathematical Imaging and Vision.

[96]  Amr Ahmed,et al.  Compact Signature-Based Compressed Video Matching Using Dominant Color Profiles (DCP) , 2014, 2014 22nd International Conference on Pattern Recognition.

[97]  Zheng Liu,et al.  Structural similarity-based video fingerprinting for video copy detection , 2014, IET Image Process..

[98]  Ivan Laptev,et al.  Efficient Feature Extraction, Encoding, and Classification for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[99]  Paween Khoenkaw,et al.  Video similarity measurement using spectrogram , 2014, 2014 International Computer Science and Engineering Conference (ICSEC).

[100]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[101]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[102]  Tinne Tuytelaars,et al.  Dense interest features for video processing , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[103]  Nicu Sebe,et al.  Video classification with Densely extracted HOG/HOF/MBH features: an evaluation of the accuracy/computational efficiency trade-off , 2015, International Journal of Multimedia Information Retrieval.

[104]  Weisi Lin,et al.  A Video Saliency Detection Model in Compressed Domain , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[105]  Edward J. Delp,et al.  An HEVC compressed domain content-based video signature for copy detection and video retrieval , 2014, Electronic Imaging.

[106]  Sanket Shinde,et al.  Recent advances in content based video copy detection , 2015, 2015 International Conference on Pervasive Computing (ICPC).

[107]  Andrew Hunter,et al.  Compressed video matching: Frame-to-frame revisited , 2016, Multimedia Tools and Applications.

[108]  Soo-Chang Pei,et al.  Simple effective image and video color correction using quaternion distance metric , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[109]  Edward J. Delp,et al.  Content based video retrieval on mobile devices: How much content is enough? , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[110]  Thomas Seidl,et al.  FELICITY: A Flexible Video Similarity Search Framework Using the Earth Mover's Distance , 2015, SISAP.

[111]  Ting Liu,et al.  Fusion of Skeletal and STIP-Based Features for Action Recognition with RGB-D Devices , 2015, ICIG.

[112]  Muzammil H Mohammed,et al.  Content based Video Retrieval Systems - Methods, Techniques, Trends and Challenges , 2015 .

[113]  Thomas Seidl,et al.  Large-scale Efficient and Effective Video Similarity Search , 2015, LSDS-IR@CIKM.

[114]  Yasuo Matsuyama,et al.  Learning Algorithms and Frame Signatures for Video Similarity Ranking , 2015, ICONIP.

[115]  Changxin Gao,et al.  Action recognition through discovering distinctive action parts. , 2015, Journal of the Optical Society of America. A, Optics, image science, and vision.

[116]  Ali Ismail Awad,et al.  Detection and Description of Image Features: An Introduction , 2016 .

[117]  Ali Ismail Awad,et al.  Image Feature Detectors and Descriptors , 2016 .

[118]  Xiang Zhai Camera Lens Detection Algorithm based on the Dominant Color Image in Soccer Video , 2016 .

[119]  Chiranjoy Chattopadhyay,et al.  Use of trajectory and spatiotemporal features for retrieval of videos with a prominent moving foreground object , 2016, Signal Image Video Process..

[120]  Amr Ahmed,et al.  Video similarity detection using fixed-length Statistical Dominant Colour Profile (SDCP) signatures , 2017, Journal of Real-Time Image Processing.

[121]  Amr Ahmed,et al.  An Integrated Signature-Based Framework for Efficient Visual Similarity Detection and Measurement in Video Shots , 2018, ACM Trans. Inf. Syst..

[122]  Amr Ahmed,et al.  Graph-based video sequence matching using dominant colour graph profile (DCGP) , 2018, Signal Image Video Process..