Creating personalized video summaries via semantic event detection

Video summarization has great potential in many application areas that enable fast browsing and efficient video indexing. Viewers prefer to browse a video summary containing the contents that they enjoy since watching an entire video may be time-consuming. We believe that it is necessary to create an automated tool that is capable of generating personalized video summaries. In this paper, we propose a new event detection-based personalized video summarization framework and deploy it to create film and soccer video summaries. In order to obtain effective event detection performance, we introduce two transfer learning method. The first event detection method is achieved based on the combination of convolutional neural network and support vector machine (CNNs–SVM). The second method is achieved using a fine-tuned summarization network (SumNet) that fuses fine-tuned object and scene networks. In this study, the training data consists of two datasets: (1) a 21K set of web images of back hugging, hand shaking, and standing talking used to detect a film event, and (2) a 30K set of web soccer match images of goals, fouls, and yellow cards to detect soccer events. Given an original video, we first segment it into shots and then use the trained model for event detection. Finally, based on the specification of user preferences, we generate a personalized event-based summary. We test our framework with several film videos and soccer videos. Experimental results demonstrate that the proposed fine-tuned SumNet achieves the best performance of 96.88% and $$98.50\%$$98.50%, which is effective for generating personalized video summaries.

[1]  Sankar K. Pal,et al.  Motion Frame Analysis and Scene Abstraction: Discrimination Ability of Fuzziness Measures , 1995, J. Intell. Fuzzy Syst..

[2]  Bohyung Han,et al.  Personalized video summarization with human in the loop , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[3]  Kristen Grauman,et al.  Intentional Photos from an Unintentional Photographer: Detecting Snap Points in Egocentric Video with a Web Photo Prior , 2014, Mobile Cloud Visual Media Computing.

[4]  Rita Cucchiara,et al.  Probabilistic posture classification for Human-behavior analysis , 2005, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[5]  Ruimin Hu,et al.  A shot boundary detection method based on color feature , 2011, Proceedings of 2011 International Conference on Computer Science and Network Technology.

[6]  Bin Zhao,et al.  Quasi Real-Time Summarization for Consumer Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Eric P. Xing,et al.  Joint Summarization of Large-Scale Collections of Web Images and Videos for Storyline Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Jing Liu,et al.  Multimedia News Summarization in Search , 2016, ACM Trans. Intell. Syst. Technol..

[11]  Jinlian Ma,et al.  A pre‐trained convolutional neural network based method for thyroid nodule diagnosis , 2017, Ultrasonics.

[12]  George Ghinea,et al.  Personalized video summarization by highest quality frames , 2014, 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[13]  Abdellatif Mtibaa,et al.  Video shot boundary detection using motion activity descriptor , 2010, ArXiv.

[14]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[15]  Zhili Zhou,et al.  Fast and accurate near-duplicate image elimination for visual sensor networks , 2017, Int. J. Distributed Sens. Networks.

[16]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Marco Pellegrini,et al.  STIMO: STIll and MOving video storyboard for the web scenario , 2009, Multimedia Tools and Applications.

[18]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Y. L. Liu,et al.  A Robust Image Hashing Algorithm Resistant Against Geometrical Attacks , 2013 .

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yong Jae Lee,et al.  Predicting Important Objects for Egocentric Video Summarization , 2015, International Journal of Computer Vision.

[22]  Chia-Feng Juang,et al.  Human Body Posture Classification by a Neural Fuzzy Network and Home Care System Application , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[23]  Yimin Yang,et al.  Fusion-based foreground enhancement for background subtraction using multivariate multi-model Gaussian distribution , 2018, Inf. Sci..

[24]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[25]  Ramakant Nevatia,et al.  Large-scale web video event classification by use of Fisher Vectors , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[26]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27]  Arnaldo de Albuquerque Araújo,et al.  VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method , 2011, Pattern Recognit. Lett..

[28]  Mikhail Zymbler,et al.  An approach to personalized video summarization based on user preferences analysis , 2015, 2015 9th International Conference on Application of Information and Communication Technologies (AICT).

[29]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Chih-Jen Lin,et al.  Large-Scale Video Summarization Using Web-Image Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Wei Jiang,et al.  A late fusion approach for harnessing multi-cnn model high-level features , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[32]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[33]  Atsuo Yoshitaka,et al.  Personalized Video Summarization Based on Behavior of Viewer , 2012, 2012 Eighth International Conference on Signal Image Technology and Internet Based Systems.

[34]  Aboul Ella Hassanien,et al.  Event Detection Based Approach for Soccer Video Summarization Using Machine learning , 2012 .

[35]  Li Sun,et al.  Event-based large scale surveillance video summarization , 2016, Neurocomputing.

[36]  Junaid Baber,et al.  Shot boundary detection from videos using entropy and local descriptor , 2011, 2011 17th International Conference on Digital Signal Processing (DSP).

[37]  Xingming Sun,et al.  Coverless Image Steganography Using Histograms of Oriented Gradients-Based Hashing Algorithm , 2017 .

[38]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Harry W. Agius,et al.  Video summarisation: A conceptual framework and survey of the state of the art , 2008, J. Vis. Commun. Image Represent..

[40]  Nicu Sebe,et al.  Looking at the viewer: analysing facial activity to detect personal highlights of multimedia contents , 2010, Multimedia Tools and Applications.

[41]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Ioannis Pitas,et al.  Information theory-based shot cut/fade detection and video summarization , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[43]  P. Dario,et al.  A Novel SMA-Based Actuator for a Legged Endoscopic Capsule , 2006, The First IEEE/RAS-EMBS International Conference on Biomedical Robotics and Biomechatronics, 2006. BioRob 2006..