Memorable and rich video summarization

Abstract Video summarization can facilitate rapid browsing and efficient video indexing in many applications. A good summary should maintain the semantic interestingness and diversity of the original video. While many previous methods extracted key frames based on low-level features, this study proposes Memorability-Entropy-based video summarization. The proposed method focuses on creating semantically interesting summaries based on image memorability. Further, image entropy is introduced to maintain the diversity of the summary. In the proposed framework, perceptual hashing-based mutual information (MI) is used for shot segmentation. Then, we use a large annotated image memorability dataset to fine-tune Hybrid-AlexNet. We predict the memorability score by using the fine-tuned deep network and calculate the entropy value of the images. The frame with the maximum memorability score and entropy value in each shot is selected to constitute the video summary. Finally, our method is evaluated on a benchmark dataset, which comes with five human-created summaries. When evaluating our method, we find it generates high-quality results, comparable to human-created summaries and conventional methods.

[1]  Sung Wook Baik,et al.  Feature aggregation based visual attention model for video summarization , 2014, Comput. Electr. Eng..

[2]  Sankar K. Pal,et al.  Motion Frame Analysis and Scene Abstraction: Discrimination Ability of Fuzziness Measures , 1995, J. Intell. Fuzzy Syst..

[3]  Y. L. Liu,et al.  A Robust Image Hashing Algorithm Resistant Against Geometrical Attacks , 2013 .

[4]  Yong Jae Lee,et al.  Predicting Important Objects for Egocentric Video Summarization , 2015, International Journal of Computer Vision.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Yongdong Zhang,et al.  Efficient Parallel Framework for HEVC Motion Estimation on Many-Core Processors , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Arnaldo de Albuquerque Araújo,et al.  VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method , 2011, Pattern Recognit. Lett..

[9]  Jurandy Almeida,et al.  VISON: VIdeo Summarization for ONline applications , 2012, Pattern Recognit. Lett..

[10]  Meng Wang,et al.  Event Driven Web Video Summarization by Tag Localization and Key-Shot Identification , 2012, IEEE Transactions on Multimedia.

[11]  Yong Yu,et al.  Video summarization via transferrable structured learning , 2011, WWW.

[12]  Guizhong Liu,et al.  A Multiple Visual Models Based Perceptive Analysis Framework for Multilevel Video Summarization , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  M. Mather,et al.  Aging and emotional memory: the forgettable nature of negative images for older adults. , 2003, Journal of experimental psychology. General.

[14]  A. Torralba,et al.  Intrinsic and extrinsic effects on image memorability , 2015, Vision Research.

[15]  Bin Zhao,et al.  Quasi Real-Time Summarization for Consumer Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Wei Jiang,et al.  New fusional framework combining sparse selection and clustering for key frame extraction , 2016, IET Comput. Vis..

[17]  L. Standing Learning 10000 pictures , 1973 .

[18]  Wolfgang Effelsberg,et al.  Video abstracting , 1997, CACM.

[19]  Matei Mancas,et al.  Memorability of natural scenes: The role of attention , 2013, 2013 IEEE International Conference on Image Processing.

[20]  David S. Doermann,et al.  Video summarization by curve simplification , 1998, MULTIMEDIA '98.

[21]  Timothy F. Brady,et al.  Scene Memory Is More Detailed Than You Think : The Role of Categories in Visual Long-Term Memory , 2010 .

[22]  Jianxiong Xiao,et al.  What Makes a Photograph Memorable? , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Yongdong Zhang,et al.  A Highly Parallel Framework for HEVC Coding Unit Partitioning Tree Decision on Many-core Processors , 2014, IEEE Signal Processing Letters.

[24]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[25]  Emrah Asan Video Shot Boundary Detection by Graph Theoretic Approaches , 2011 .

[26]  Harry W. Agius,et al.  Video summarisation: A conceptual framework and survey of the state of the art , 2008, J. Vis. Commun. Image Represent..

[27]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[28]  Yongdong Zhang,et al.  Parallel deblocking filter for HEVC on many-core processor , 2014 .

[29]  Marco Pellegrini,et al.  STIMO: STIll and MOving video storyboard for the web scenario , 2009, Multimedia Tools and Applications.

[30]  Jiang Peng,et al.  Keyframe-Based Video Summary Using Visual Attention Clues , 2010 .

[31]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[32]  Jaron Lanier The frontier between us , 1997, CACM.

[33]  Liang Li,et al.  Efficient parallel HEVC intra-prediction on many-core processor , 2014 .

[34]  Antonio Torralba,et al.  Understanding and Predicting Image Memorability at a Large Scale , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  Sung Wook Baik,et al.  Efficient visual attention based framework for extracting key frames from videos , 2013, Signal Process. Image Commun..

[36]  Chih-Jen Lin,et al.  Large-Scale Video Summarization Using Web-Image Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[38]  John R. Kender,et al.  Optimization Algorithms for the Selection of Key Frame Sequences of Variable Length , 2002, ECCV.

[39]  Aykut Erdem,et al.  Visual Attention-Driven Spatial Pooling for Image Memorability , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[40]  Qiang Zhang,et al.  An Efficient Method of Key-Frame Extraction Based on a Cluster Algorithm , 2013, Journal of human kinetics.

[41]  Yang Yi,et al.  Key frame extraction based on visual attention model , 2012, J. Vis. Commun. Image Represent..

[42]  Lie Lu,et al.  A generic framework of user attention model and its application in video summarization , 2005, IEEE Trans. Multim..

[43]  Wayne H. Wolf,et al.  Key frame selection by motion analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[44]  Aykut Erdem,et al.  Predicting memorability of images using attention-driven spatial pooling and image semantics , 2015, Image Vis. Comput..

[45]  Bernard Mérialdo,et al.  Multi-document video summarization , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[46]  Jiebo Luo,et al.  Towards Scalable Summarization of Consumer Videos Via Sparse Dictionary Selection , 2012, IEEE Transactions on Multimedia.