On-the-fly video genre classification by combination of audio features

Video genre identification methods are frequently based on image or motion analysis, which are relatively time-consuming processes. Since such approaches are tractable by batch processing, as-soon-as-possible identification requires faster methods. In this paper, we investigate the use of audio-only methods for on-the-fly video classification. We propose to use several acoustic feature streams and we evaluate various combination schemes at the frame or at the score level. Results are compared to those obtained by humans, according to the listening duration. Although the system based on model combination slightly outperforms the humans on very soon detection. The latter remain significantly more accurate on long sessions.

[1]  Georges Linarès,et al.  Factor analysis for audio-based video genre classification , 2009, INTERSPEECH.

[2]  R.S. Jasinschi,et al.  Automatic TV program genre classification based on audio patterns , 2001, Proceedings 27th EUROMICRO Conference. 2001: A Net Odyssey.

[3]  Lukás Burget,et al.  Combination of speech features using smoothed heteroscedastic linear discriminant analysis , 2004, INTERSPEECH.

[4]  Roach Matthew Classification of Non-edited Broadcast Video Using Holistic Low-level Features , 2002 .

[5]  Georges Linarès,et al.  Frame-based acoustic feature integration for speech understanding , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  John S. D. Mason,et al.  Classification of video genre using audio , 2001, INTERSPEECH.

[7]  Georges Linarès,et al.  Robust audio-based classification of video genre , 2009, INTERSPEECH.

[8]  Christopher Kermorvant,et al.  Features for HMM-Based Arabic Handwritten Word Recognition Systems , 2012 .

[9]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[10]  Hermann Ney,et al.  Using multiple acoustic feature sets for speech recognition , 2007, Speech Commun..

[11]  Andreas G. Andreou,et al.  Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition , 1997 .

[12]  Chafic Mokbel,et al.  Combination of HMM-Based Classifiers for the Recognition of Arabic Handwritten Words , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[13]  Weiyu Zhu,et al.  Automatic news video segmentation and categorization based on closed-captioned text , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[14]  Marcel Worring,et al.  Multimodal Video Indexing : A Review of the State-ofthe-art , 2001 .

[15]  Yongmin Li,et al.  Video classification using spatial-temporal features and PCA , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[16]  Svetha Venkatesh,et al.  Horror film genre typing and scene labeling via audio analysis , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).