Automatic consumer video summarization by audio and visual analysis

Video summarization provides a condensed version of a video stream by analyzing the video content. Automatic summarization of consumer videos is an important tool that facilitates efficient browsing, searching, and album creation in large consumer video collections. This paper studies automatic video summarization in the consumer domain where most previous methods cannot be easily applied due to the challenging issues for content analysis, i.e., consumer videos are captured with uncontrolled conditions such as uneven illumination, clutter, and large camera motion, and with poor-quality soundtrack as a mix of multiple sound sources under severe noise. To pursue reliable summarization, a case study with actual consumer users is conducted, from which a set of consumer-oriented guidelines is obtained. The guidelines reflect the high-level semantic rules, in both visual and audio aspects, which are recognized by consumers as important to produce good video summaries. Following these guidelines, an automatic video summarization algorithm is developed where both visual and audio information are used to generate improved summaries. To the best of our knowledge, this is a first systematic study on automatic summarization of consumer-quality videos. Experimental evaluations from consumer subjects show the effectiveness of our approach.

[1]  Chng Eng Siong,et al.  Generation of Personalized Music Sports Video Using Multimodal Cues , 2007, IEEE Transactions on Multimedia.

[2]  Harry W. Agius,et al.  Video summarisation: A conceptual framework and survey of the state of the art , 2008, J. Vis. Commun. Image Represent..

[3]  Alexander C. Loui,et al.  Measuring the perceived aesthetic quality of photographic images , 2009, 2009 International Workshop on Quality of Multimedia Experience.

[4]  Yan Ke,et al.  The Design of High-Level Features for Photo Quality Assessment , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Ajay Divakaran,et al.  Subjective assessment of consumer video summarization , 2006, Electronic Imaging.

[6]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[7]  John Zimmerman,et al.  Study on requirement specifications for personalized multimedia summarization , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[8]  Ioannis Pitas,et al.  Information theory-based shot cut/fade detection and video summarization , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Liang Gu,et al.  Robust singing detection in speech/music discriminator design , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[10]  P. Mermelstein,et al.  Distance measures for speech recognition, psychological and instrumental , 1976 .

[11]  Lie Lu,et al.  Automatic music video generation based on temporal pattern analysis , 2004, MULTIMEDIA '04.

[12]  Jiebo Luo,et al.  Kodak consumer video benchmark data set : concept definition and annotation * * , 2008 .

[13]  Masaharu Ogawa,et al.  A highlight scene detection and video summarization system using audio feature for a personal video recorder , 2005, IEEE Transactions on Consumer Electronics.

[14]  Charles Parker An Empirical Study of Feature Extraction Methods for Audio Classification , 2010, 2010 20th International Conference on Pattern Recognition.

[15]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Francisco Nivando Bezerra,et al.  Low cost soccer video summaries based on visual rhythm , 2006, MIR '06.

[17]  Gary Marchionini,et al.  What are the most eye-catching and ear-catching features in the video?: implications for video summarization , 2010, WWW '10.

[18]  Yi-Ping Phoebe Chen,et al.  Highlights for more complete sports video summarization , 2004, IEEE MultiMedia.

[19]  Alexander C. Loui,et al.  Automatic aesthetic value assessment in photographic images , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[20]  Aggelos K. Katsaggelos,et al.  MINMAX optimal video summarization , 2005, IEEE Transactions on Circuits and Systems for Video Technology.