Audio and video combined for home video abstraction

With the increasing number of people who can afford to make videos to record their lives, home videos play a more and more important role in multimedia. Video abstraction is an efficient way to help review such a huge amount of home videos. A home video abstraction technique combining audio and video features is presented. The audio contents are firstly classified as silence, pure speech, non-pure speech, music and background sound using support vector machines (SVMs). Then, non-pure speech is further classified into song and other non-pure speech using SVM, and background sound is classified into laughter, applause, scream and others using hidden Markov models (HMMs). For video contents, motion level and blur degree are acquired. Finally, video segments containing special features, such as speech, laughter, song, applause, scream, and specified motion level and blur degree, are extracted as the main parts of the abstract. The remaining parts of the abstract are generated using key frame information. Experimental results show that the proposed algorithm can extract the desired parts of a home video to generate satisfactory video abstracts.

[1]  Lie Lu,et al.  Content-based audio segmentation using support vector machines , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[2]  Alan Hanjalic,et al.  An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis , 1999, IEEE Trans. Circuits Syst. Video Technol..

[3]  Guodong Guo,et al.  Boosting for content-based audio classification and retrieval: an evaluation , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[4]  Takeo Kanade,et al.  Video skimming and characterization through the combination of image and language understanding , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[5]  Michael A. Smith,et al.  Video skimming and characterization through the combination of image and language understanding techniques , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Wolfgang Effelsberg,et al.  Abstracting Digital Movies Automatically , 1996, J. Vis. Commun. Image Represent..

[7]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[8]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[9]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[10]  Atreyi Kankanhalli,et al.  Automatic partitioning of full-motion video , 1993, Multimedia Systems.

[11]  C.-C. Jay Kuo,et al.  Hierarchical classification of audio data for archiving and retrieving , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[12]  Wei-Ying Ma,et al.  Blur determination in the compressed domain using DCT information , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[13]  Rainer Lienhart,et al.  Abstracting home video automatically , 1999, MULTIMEDIA '99.