Video summarization via exploring the global and local importance

Video Summarization is to generate an important or interesting short video from a long video. It is important to reduce the time required to analyze the same archived video by removing unnecessary video data. This work proposes a novel method to generate dynamic video summarization by fusing the global importance and local importance based on multiple features and image quality. First, videos are split into several suitable video clips. Second, video frames are extracted from each video clip, and the center parts of frames are also extracted. Third, for each frame and the center part, the global importance and the local importance are calculated by using a set of features and image quality. Finally, the global importance and the local importance are fused to select an optimal subset for generating video summarization. Extensive experiments are conducted to demonstrate that the proposed method enables to generate high-quality video summarization.

[1]  Wayne H. Wolf,et al.  Key frame selection by motion analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[2]  Yan Liu,et al.  Unsupervised summarization of rushes videos , 2010, ACM Multimedia.

[3]  Jinhui Tang,et al.  Weakly Supervised Deep Matrix Factorization for Social Image Understanding , 2017, IEEE Transactions on Image Processing.

[4]  Narendra Ahuja,et al.  Robust video shot change detection , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[5]  C. Schmid,et al.  Category-Specific Video Summarization , 2014, ECCV.

[6]  Luming Zhang,et al.  An Effective Video Summarization Framework Toward Handheld Devices , 2015, IEEE Transactions on Industrial Electronics.

[7]  Jinhui Tang,et al.  Unsupervised Video Summaries Using Multiple Features and Image Quality , 2017, 2017 IEEE Third International Conference on Multimedia Big Data (BigMM).

[8]  Loong Fah Cheong,et al.  Shot Change Detection Using Scene-Based Constraint , 2001, Multimedia Tools and Applications.

[9]  Xin Liu,et al.  Video summarization using singular value decomposition , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[10]  David Salesin,et al.  Schematic storyboarding for video visualization and editing , 2006, SIGGRAPH 2006.

[11]  Mohamed A. Ismail,et al.  Unsupervised Video Summarization via Dynamic Modeling-Based Hierarchical Clustering , 2013, 2013 12th International Conference on Machine Learning and Applications.

[12]  Yael Pritch,et al.  Webcam Synopsis: Peeking Around the World , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13]  Sung Wook Baik,et al.  Efficient visual attention based framework for extracting key frames from videos , 2013, Signal Process. Image Commun..

[14]  Patricia Ladret,et al.  The blur effect: perception and estimation with a new no-reference perceptual blur metric , 2007, Electronic Imaging.

[15]  Yongfeng Zhang,et al.  Personalized Key Frame Recommendation , 2017, SIGIR.

[16]  Joseph V. Mascelli The five C's of cinematography : motion picture filming techniques simplified , 1965 .

[17]  Jing Liu,et al.  Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection , 2014, IEEE Transactions on Knowledge and Data Engineering.

[18]  Yael Pritch,et al.  Making a Long Video Short: Dynamic Video Synopsis , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  Arnaldo de Albuquerque Araújo,et al.  VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method , 2011, Pattern Recognit. Lett..

[20]  Hideyuki Tamura,et al.  Textural Features Corresponding to Visual Perception , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[21]  Ali Farhadi,et al.  Ranking Domain-Specific Highlights by Analyzing Edited Videos , 2014, ECCV.

[22]  Mihai Datcu,et al.  The Semantic Gap: An Exploration of User and Computer Perspectives in Earth Observation Images , 2015, IEEE Geoscience and Remote Sensing Letters.

[23]  Bin Zhao,et al.  Quasi Real-Time Summarization for Consumer Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[25]  Anoop Gupta,et al.  Automatically extracting highlights for TV Baseball programs , 2000, ACM Multimedia.

[26]  Hong-Yuan Mark Liao,et al.  Shot Change Detection Based on the Reynolds Transport Theorem , 2001, IEEE Pacific Rim Conference on Multimedia.

[27]  W. Chu Studying Aesthetics in Photographic Images Using a Computational Approach , 2013 .

[28]  Junchi Yan,et al.  Improving Semantic Scene Categorization by Exploiting Audio-Visual Features , 2009, 2009 Fifth International Conference on Image and Graphics.

[29]  Christof Koch,et al.  Image Signature: Highlighting Sparse Salient Regions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[31]  Changsheng Xu,et al.  Multimodal Spatio-Temporal Theme Modeling for Landmark Analysis , 2014, IEEE MultiMedia.