Fast Summarization of User-Generated Videos: Exploiting Semantic, Emotional, and Quality Clues

This article introduces a novel approach for fast summarization of user-generated videos (UGVs). Different from other types of videos where the semantic content might vary greatly over time, most UGVs contain only a single shot with relatively consistent high-level semantics and emotional content. Therefore, a few representative segments, which can be selected based on segment-level semantic and emotional recognition results, are generally sufficient for a summary. In addition, due to the poor shooting quality of many UGVs, factors such as camera shaking and lighting conditions are also considered to achieve more pleasant summaries. This article is part of a special issue on quality modeling.

[1]  Yu-Gang Jiang,et al.  SUPER: towards real-time event recognition in internet videos , 2012, ICMR.

[2]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[4]  Jinchang Ren,et al.  Hierarchical Modeling and Adaptive Clustering for Real-Time Summarization of Rush Videos , 2009, IEEE Transactions on Multimedia.

[5]  Yi-Ping Phoebe Chen,et al.  Sports video summarization using highlights and play-breaks , 2003, MIR '03.

[6]  Chih-Jen Lin,et al.  Large-Scale Video Summarization Using Web-Image Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Paul Over,et al.  The trecvid 2007 BBC rushes summarization evaluation pilot , 2007, TVS '07.

[8]  Shih-Fu Chang,et al.  Consumer video understanding: a benchmark database and an evaluation of human and machine performance , 2011, ICMR.

[9]  Yaser Sheikh,et al.  On the use of computable features for film classification , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Daniel P. W. Ellis,et al.  Audio-Based Semantic Concept Classification for Consumer Video , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Mohan S. Kankanhalli,et al.  Detection and removal of lighting & shaking artifacts in home videos , 2002, MULTIMEDIA '02.

[12]  Shingo Uchihashi,et al.  An interactive comic book presentation for exploring video , 2000, CHI.

[13]  Xiaoou Tang,et al.  Photo and Video Quality Evaluation: Focusing on the Subject , 2008, ECCV.

[14]  Petros Maragos,et al.  Movie summarization based on audiovisual saliency detection , 2008, 2008 15th IEEE International Conference on Image Processing.

[15]  Arnold W. M. Smeulders,et al.  Real-Time Visual Concept Classification , 2010, IEEE Transactions on Multimedia.

[16]  Xiangyang Xue,et al.  Predicting Emotions in User-Generated Videos , 2014, AAAI.

[17]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[18]  Xi Wang,et al.  Real-time summarization of user-generated videos based on semantic recognition , 2014, ACM Multimedia.