Automatic home video abstraction using audio contents

With the increasing number of people who can afford to make videos to record their lives, home videos play more and more important role in people's lives. Video abstraction is an efficient way to help review such a huge amount of home videos. In this paper, an automatic home video abstraction method mainly using audio contents is presented. The audio contents are first segmented and classified as speech, music, silence and special sounds basing on audio short-time features and morphology. Then special sounds are further categorized as songs, laughter, applause, scream and others using Hidden Markov Model (HMM). After that, motion level and blur degree are acquired using the video contents. Finally, video segments containing special effects, such as speech, laughter, song, applause, scream, and specified motion level and blur degree, are extracted as the main parts of the abstract. The remaining parts of the abstract are generated using key frame information. The experimental results show that the proposed algorithm can extract desired parts ofhome video to generate satisfactory video abstracts

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  Wei-Ying Ma,et al.  Blur determination in the compressed domain using DCT information , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[3]  Rainer Lienhart,et al.  Abstracting home video automatically , 1999, MULTIMEDIA '99.

[4]  Peter Kabal,et al.  Speech/music discrimination for multimedia applications , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  Wolfgang Effelsberg,et al.  Abstracting Digital Movies Automatically , 1996, J. Vis. Commun. Image Represent..

[6]  Alan Hanjalic,et al.  An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis , 1999, IEEE Trans. Circuits Syst. Video Technol..

[7]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Wolfgang Effelsberg,et al.  Video abstracting , 1997, CACM.

[10]  Rainer Lienhart Dynamic video summarization of home video , 1999, Electronic Imaging.

[11]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[12]  Michael A. Smith,et al.  Video skimming and characterization through the combination of image and language understanding techniques , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Takeo Kanade,et al.  Video skimming and characterization through the combination of image and language understanding , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[14]  C.-C. Jay Kuo,et al.  Content-based classification and retrieval of audio , 1998, Optics & Photonics.

[15]  C.-C. Jay Kuo,et al.  Hierarchical classification of audio data for archiving and retrieving , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).