论文信息 - Robust audio-based classification of video genre

Robust audio-based classification of video genre

Video genre classification is a challenging task in a global context of fast growing video collections available on the Internet. This paper presents a new method for video genre identification by audio analysis. Our approach relies on the combination of low and high level audio features. We investigate the discrimi-native capacity of features related to acoustic instability, speaker interactivity, speech quality and acoustic space characterization. The genre identification is performed on these features by using a SVM classifier. Experiments are conducted on a corpus composed from cartoons, movies, news, commercials and musics on which we obtain an identification rate of 91%. Index Terms: video genre classification, audio-based video processing

Georges Linarès | Mickael Rouvier | Driss Matrouf

[1] Svetha Venkatesh,et al. Horror film genre typing and scene labeling via audio analysis , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[2] Driss Matrouf,et al. A straightforward and efficient implementation of the factor analysis model for speaker verification , 2007, INTERSPEECH.

[3] John S. D. Mason,et al. Classification of video genre using audio , 2001, INTERSPEECH.

[4] Weiyu Zhu,et al. Automatic news video segmentation and categorization based on closed-captioned text , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[5] Dan Istrate,et al. Broadcast news speaker tracking for ESTER 2005 campaign , 2005, INTERSPEECH.

[6] R.S. Jasinschi,et al. Automatic TV program genre classification based on audio patterns , 2001, Proceedings 27th EUROMICRO Conference. 2001: A Net Odyssey.

[7] Diane J. Cook,et al. Automatic Video Classification: A Survey of the Literature , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[8] Hervé Bourlard,et al. Robust HMM-based speech/music segmentation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9] Roach Matthew. Classification of Non-edited Broadcast Video Using Holistic Low-level Features , 2002 .

[10] Yongmin Li,et al. Video classification using spatial-temporal features and PCA , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[11] Douglas A. Reynolds,et al. A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[12] Jean-Luc Gauvain,et al. Combining speaker identification and BIC for speaker diarization , 2005, INTERSPEECH.

[13] Georges Linarès,et al. The LIA Speech Recognition System: From 10xRT to 1xRT , 2007, TSD.