Aesthetics-Guided Summarization from Multiple User Generated Videos

In recent years, with the rapid development of camera technology and portable devices, we have witnessed a flourish of user generated videos, which are gradually reshaping the traditional professional video oriented media market. The volume of user generated videos in repositories is increasing at a rapid rate. In today's video retrieval systems, a simple query will return many videos which seriously increase the viewing burden. To manage these video retrievals and provide viewers with an efficient way to browse, we introduce a system to automatically generate a summarization from multiple user generated videos and present their salience to viewers in an enjoyable manner. Among multiple consumer videos, we find their qualities to be highly diverse due to various factors such as a photographer's experience or environmental conditions at the time of capture. Such quality inspires us to include a video quality evaluation component into the video summarization since videos with poor qualities can seriously degrade the viewing experience. We first propose a probabilistic model to evaluate the aesthetic quality of each user generated video. This model compares the rich aesthetics information from several well-known photo databases with generic unlabeled consumer videos, under a human perception component indicating the correlation between a video and its constituting frames. Subjective studies were carried out with the results indicating that our method is reliable. Then a novel graph-based formulation is proposed for the multi-video summarization task. Desirable summarization criteria is incorporated as the graph attributes and the problem is solved through a dynamic programming framework. Comparisons with several state-of-the-art methods demonstrate that our algorithm performs better than other methods in generating a skimming video in preserving the essential scenes from the original multiple input videos, with smooth transitions among consecutive segments and appealing aesthetics overall.

[1]  Yi Yang,et al.  Effective transfer tagging from image to video , 2013, TOMCCAP.

[2]  James Ze Wang,et al.  Studying Aesthetics in Photographic Images Using a Computational Approach , 2006, ECCV.

[3]  Chu-Song Chen,et al.  Video aesthetic quality assessment by combining semantically independent and dependent features , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Ying Zhang,et al.  DVS: a dynamic multi-video summarization system of sensor-rich videos in geo-space , 2012, ACM Multimedia.

[5]  Anoop Gupta,et al.  Auto-summarization of audio-video presentations , 1999, MULTIMEDIA '99.

[6]  Zhou Wang,et al.  Video quality assessment using a statistical model of human visual speed perception. , 2007, Journal of the Optical Society of America. A, Optics, image science, and vision.

[7]  Bernard Mérialdo,et al.  Multi-video summarization based on OB-MMR , 2011, 2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI).

[8]  Ivan V. Bajic,et al.  A Joint Approach to Global Motion Estimation and Motion Segmentation From a Coarsely Sampled Motion Vector Field , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  James Ze Wang,et al.  Algorithmic inferencing of aesthetics and emotion in natural images: An exposition , 2008, 2008 15th IEEE International Conference on Image Processing.

[10]  Mubarak Shah,et al.  A framework for photo-quality assessment and enhancement based on visual aesthetics , 2010, ACM Multimedia.

[11]  Rongrong Ji,et al.  Photo assessment based on computational visual attention model , 2009, ACM Multimedia.

[12]  Changsheng Xu,et al.  A Novel Framework for Semantic Annotation and Personalized Retrieval of Sports Video , 2008, IEEE Transactions on Multimedia.

[13]  Yu-Gang Jiang,et al.  Beauty is here: evaluating aesthetics in videos using multimodal features and free training data , 2013, MM '13.

[14]  Kiyoharu Aizawa,et al.  Context-based video retrieval system for the life-log applications , 2003, MIR '03.

[15]  Stephen Lin,et al.  Learning the Change for Automatic Image Cropping , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Wei Tsang Ooi,et al.  MoViMash: online mobile video mashup , 2012, ACM Multimedia.

[17]  Xiao Liu,et al.  Probabilistic Graphlet Transfer for Photo Cropping , 2013, IEEE Transactions on Image Processing.

[18]  Alan C. Bovik,et al.  41 OBJECTIVE VIDEO QUALITY ASSESSMENT , 2003 .

[19]  Frank M. Shipman,et al.  Creating navigable multi-level video summaries , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[20]  Masaharu Ogawa,et al.  A highlight scene detection and video summarization system using audio feature for a personal video recorder , 2005, IEEE Transactions on Consumer Electronics.

[21]  A. Bovik,et al.  OBJECTIVE VIDEO QUALITY ASSESSMENT , 2003 .

[22]  Mohan S. Kankanhalli,et al.  Automatic summarization of music videos , 2006, TOMCCAP.

[23]  Ying Zhang,et al.  Dynamic Multi-video Summarization of Sensor-Rich Videos in Geo-Space , 2013, MMM.

[24]  Gabriela Csurka,et al.  Assessing the aesthetic quality of photographs using generic image descriptors , 2011, 2011 International Conference on Computer Vision.

[25]  Nuria Oliver,et al.  Towards Computational Models of the Visual Aesthetic Appeal of Consumer Videos , 2010, ECCV.

[26]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[27]  Chih-Jen Lin,et al.  Large-Scale Video Summarization Using Web-Image Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Vicente Ordonez,et al.  High level describable attributes for predicting aesthetics and interestingness , 2011, CVPR 2011.

[29]  Shao-Yi Chien,et al.  Scenic photo quality assessment with bag of aesthetics-preserving features , 2011, ACM Multimedia.

[30]  Xuelong Li,et al.  Fusion of Multichannel Local and Global Structural Cues for Photo Aesthetics Evaluation , 2014, IEEE Transactions on Image Processing.

[31]  A. Murat Tekalp,et al.  Automatic soccer video analysis and summarization , 2003, IEEE Trans. Image Process..

[32]  Bernard Mérialdo,et al.  Multi-video summarization based on Video-MMR , 2010, 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10.

[33]  Changsheng Xu,et al.  Live sports event detection based on broadcast video and web-casting text , 2006, MM '06.

[34]  Bernard Mérialdo,et al.  Multi-document video summarization , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[35]  Jiebo Luo,et al.  Towards Scalable Summarization of Consumer Videos Via Sparse Dictionary Selection , 2012, IEEE Transactions on Multimedia.

[36]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[37]  Hong Chen,et al.  Multi-video summarization using complex graph clustering and mining , 2010, Comput. Sci. Inf. Syst..

[38]  Bernard Mérialdo,et al.  Multi-video summarization based on AV-MMR , 2010, 2010 International Workshop on Content Based Multimedia Indexing (CBMI).

[39]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[40]  Hyung-Myung Kim,et al.  Summarization of news video and its description for content‐based access , 2003, Int. J. Imaging Syst. Technol..

[41]  Jae-Gil Lee,et al.  Trajectory clustering: a partition-and-group framework , 2007, SIGMOD '07.

[42]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[43]  Ying Zhang,et al.  Multi-video summary and skim generation of sensor-rich videos in geo-space , 2012, MMSys '12.

[44]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[45]  Harry W. Agius,et al.  Video summarisation: A conceptual framework and survey of the state of the art , 2008, J. Vis. Commun. Image Represent..

[46]  Wolfgang Effelsberg,et al.  Crowdsourced evaluation of the perceived viewing quality in user-generated video , 2013, CrowdMM '13.

[47]  Jia Hao,et al.  Keyframe presentation for browsing of user-generated videos on map interfaces , 2011, MM '11.

[48]  Xiaoou Tang,et al.  Photo and Video Quality Evaluation: Focusing on the Subject , 2008, ECCV.