Video Fragmentation and Reverse Search on the Web

This chapter is focused on methods and tools for video fragmentation and reverse search on the web. These technologies can assist journalists when they are dealing with fake news—which nowadays are being rapidly spread via social media platforms—that rely on the reuse of a previously posted video from a past event with the intention to mislead the viewers about a contemporary event. The fragmentation of a video into visually and temporally coherent parts and the extraction of a representative keyframe for each defined fragment enables the provision of a complete and concise keyframe-based summary of the video. Contrary to straightforward approaches that sample video frames with a constant step, the generated summary through video fragmentation and keyframe extraction is considerably more effective for discovering the video content and performing a fragment-level search for the video on the web. This chapter starts by explaining the nature and characteristics of this type of reuse-based fake news in its introductory part, and continues with an overview of existing approaches for temporal fragmentation of single-shot videos into sub-shots (the most appropriate level of temporal granularity when dealing with user-generated videos) and tools for performing reverse search of a video on the web. Subsequently, it describes two state-of-the-art methods for video sub-shot fragmentation—one relying on the assessment of the visual coherence over sequences of frames, and another one that is based on the identification of camera activity during the video recording—and presents the InVID web application that enables the fine-grained (at the fragment-level) reverse search for near-duplicates of a given video on the web. In the sequel, the chapter reports the findings of a series of experimental evaluations regarding the efficiency of the above-mentioned technologies, which indicate their competence to generate a concise and complete keyframe-based summary of the video content, and the use of this fragment-level representation for fine-grained reverse video search on the web. Finally, it draws conclusions about the effectiveness of the presented technologies and outlines our future plans for further advancing them.

[1]  Mona Omidyeganeh,et al.  Video Keyframe Analysis Using a Segment-Based Statistical Metric in a Visually Sensitive Parametric Space , 2011, IEEE Trans. Image Process..

[2]  Rita Cucchiara,et al.  Sub-Shot Summarization for MPEG-7 based Fast Browsing , 2006, IRCDL.

[3]  Thomas Sikora,et al.  Feature-based video key frame extraction for low quality video sequences , 2009, 2009 10th Workshop on Image Analysis for Multimedia Interactive Services.

[4]  Jenny Benois-Pineau,et al.  Motion Estimation in Colour Image Sequences , 2013 .

[5]  Yan Liu,et al.  Rushes video summarization using audio-visual information and sequence alignment , 2008, TVS '08.

[6]  Hyung-Myung Kim,et al.  Efficient camera motion characterization for MPEG video indexing , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[7]  HongJiang Zhang,et al.  A novel motion-based representation for video mining , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[8]  Noboru Babaguchi,et al.  [Invited Paper] Content Analysis for Home Videos , 2013 .

[9]  Yung-Yu Chuang,et al.  NTU TRECVID-2007 fast rushes summarization system , 2007, TVS '07.

[10]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[11]  Alexandru Iosup,et al.  Procedural content generation for games: A survey , 2013, TOMCCAP.

[12]  Zygmunt Pizlo,et al.  Camera Motion-Based Analysis of User Generated Video , 2010, IEEE Transactions on Multimedia.

[13]  Bhabatosh Chanda,et al.  Detection of representative frames of a shot using multivariate Wald-Wolfowitz test , 2008, 2008 19th International Conference on Pattern Recognition.

[14]  Mateu Sbert,et al.  Selecting Video Key Frames Based on Relative Entropy and the Extreme Studentized Deviate Test , 2016, Entropy.

[15]  Fernando Díaz-de-María,et al.  Temporal segmentation and keyframe selection methods for user-generated video search-based annotation , 2015, Expert Syst. Appl..

[16]  Chong-Wah Ngo,et al.  Motion analysis and segmentation through spatio-temporal slices processing , 2003, IEEE Trans. Image Process..

[17]  Jiebo Luo,et al.  Towards Extracting Semantically Meaningful Key Frames From Personal Video Clips: From Humans to Computers , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  Vasileios Mezaris,et al.  A Motion-Driven Approach for Fine-Grained Temporal Segmentation of User-Generated Videos , 2018, MMM.

[19]  James M. Rehg,et al.  Gaze-enabled egocentric video summarization via constrained submodular maximization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Moncef Gabbouj,et al.  Multimodal Event Detection in User Generated Videos , 2011, 2011 IEEE International Symposium on Multimedia.

[21]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[22]  Irena Koprinska,et al.  Video segmentation of MPEG compressed data , 1998, 1998 IEEE International Conference on Electronics, Circuits and Systems. Surfing the Waves of Science and Technology (Cat. No.98EX196).

[23]  Chong-Wah Ngo,et al.  Video summarization and scene detection by graph modeling , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[24]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[25]  Akio Yamada,et al.  The MPEG-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[26]  Roger Zimmermann,et al.  Motch: an automatic motion type characterization system for sensor-rich videos , 2012, ACM Multimedia.

[27]  Tao Mei,et al.  Near-lossless semantic video summarization and its applications to video analysis , 2013, TOMCCAP.

[28]  J.-Y. Bouguet,et al.  Pyramidal implementation of the lucas kanade feature tracker , 1999 .

[29]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Alan F. Smeaton,et al.  Rushes video summarization using a collaborative approach , 2008, TVS '08.

[31]  Alan F. Smeaton,et al.  Automatic summarization of rushes video using bipartite graphs , 2009, Multimedia Tools and Applications.

[32]  Noel E. O'Connor,et al.  An interactive and multi-level framework for summarising user generated videos , 2009, ACM Multimedia.

[33]  Jenny Benois-Pineau,et al.  Hierarchical Hidden Markov Model in detecting activities of daily living in wearable videos for studies of dementia , 2011, Multimedia Tools and Applications.

[34]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[35]  Vasileios Mezaris,et al.  Fast shot segmentation combining global and local visual descriptors , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  Symeon Papadopoulos,et al.  The InVID Plug-in: Web Video Verification on the Browser , 2017, MuVer@MM.

[37]  Onni Ojutkangas,et al.  Location based abstraction of user generated mobile videos , 2012, Signal Process. Image Commun..

[38]  Noel E. O'Connor,et al.  Identifying an efficient and robust sub-shot segmentation method for home movie summarisation , 2010, 2010 10th International Conference on Intelligent Systems Design and Applications.