Unsupervised Video Summarization Based on Consistent Clip Generation

It becomes increasingly convenient for people to shoot, store and share videos of their daily life on social networks, which makes it increasingly difficult to find desired video content from the massive video data. Therefore, it is necessary to develop automatic video summarization methods. Previous methods focus on category-specific videos and build various complex models, and recent deep learning approaches need a large amount of annotated data to train the network. This paper proposes a new unsupervised video summarization method, which selects a group of highlight clips with self-consistency. Specifically, we propose a consistent clip generation method, i.e. the cutting-merging-adjusting scheme, by exploring the clip similarity and the local similarity. The consistent clips are obtained by merging similar clips iteratively and adjusting the boundaries of each consistent clip to remove the inconsistency of the boundaries between clips and logical events. Then, we estimate the importance score of each consistent clip by computing the interestingness score of its frames, based on which we select the top important clips to generate a video summary. Experimental results show that our method is able to generate high-quality summaries which are closer to human perception, compared to several existing methods.

[1]  Arnaldo de Albuquerque Araújo,et al.  VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method , 2011, Pattern Recognit. Lett..

[2]  Hideyuki Tamura,et al.  Textural Features Corresponding to Visual Perception , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[3]  Chih-Jen Lin,et al.  Large-Scale Video Summarization Using Web-Image Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Ali Farhadi,et al.  Salient Montages from Unconstrained Videos , 2014, ECCV.

[5]  Richard Szeliski,et al.  First-person hyper-lapse videos , 2014, ACM Trans. Graph..

[6]  Naokazu Yokoya,et al.  Video Summarization Using Deep Semantic Features , 2016, ACCV.

[7]  Ali Farhadi,et al.  Summarizing Unconstrained Videos Using Salient Montages , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Tao Mei,et al.  Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  C. Schmid,et al.  Category-Specific Video Summarization , 2014, ECCV.

[10]  Wei Jiang,et al.  Memorable and rich video summarization , 2017, J. Vis. Commun. Image Represent..

[11]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[12]  Xin Liu,et al.  Video summarization using singular value decomposition , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[13]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[15]  Jinhui Tang,et al.  Robust Structured Nonnegative Matrix Factorization for Image Representation , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[17]  Sung Wook Baik,et al.  Efficient visual attention based framework for extracting key frames from videos , 2013, Signal Process. Image Commun..

[18]  David Salesin,et al.  Schematic storyboarding for video visualization and editing , 2006, SIGGRAPH '06.

[19]  Chong-Wah Ngo,et al.  Video summarization and scene detection by graph modeling , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[20]  Yan Ke,et al.  The Design of High-Level Features for Photo Quality Assessment , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[21]  David Salesin,et al.  Schematic storyboarding for video visualization and editing , 2006, SIGGRAPH 2006.

[22]  Yael Pritch,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008 1 Non-Chronological Video , 2022 .

[23]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Nuno Vasconcelos,et al.  A spatiotemporal motion model for video summarization , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[25]  Ke Zhang,et al.  Video Summarization with Long Short-Term Memory , 2016, ECCV.

[26]  Sabine Süsstrunk,et al.  Measuring colorfulness in natural images , 2003, IS&T/SPIE Electronic Imaging.

[27]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[28]  S. Süsstrunk,et al.  Measuring colourfulness in natural images , 2003 .

[29]  Jinhui Tang,et al.  Weakly Supervised Deep Metric Learning for Community-Contributed Image Retrieval , 2015, IEEE Transactions on Multimedia.

[30]  Luc Van Gool,et al.  Video summarization by learning submodular mixtures of objectives , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Lie Lu,et al.  A generic framework of user attention model and its application in video summarization , 2005, IEEE Trans. Multim..

[32]  James Ze Wang,et al.  Studying Aesthetics in Photographic Images Using a Computational Approach , 2006, ECCV.

[33]  Jinhui Tang,et al.  Weakly Supervised Deep Matrix Factorization for Social Image Understanding , 2017, IEEE Transactions on Image Processing.

[34]  HongJiang Zhang,et al.  A model of motion attention for video skimming , 2002, Proceedings. International Conference on Image Processing.