Training audio events detectors with a sound effects corpus

This paper describes the work done in the framework of the VIDIVIDEO European project in terms of audio event detection. Our first experiments concerned the detection of nonvoice sounds, such as birds, machines, traffic, water and steps. Given the unavailability of a corpus labelled in terms of audio events, we used a relatively small sound effect corpus for training. Our initial experiments with one-against-all SVM classifiers for these 5 classes showed us the feasibility of using this type of data for training, thus avoiding the extremely morose task of manual labelling of a very high number of audio events. Preliminary integration experiments are quite promising.

[1]  João Paulo da Silva Neto,et al.  A Prototype System for Selective Dissemination of Broadcast News in European Portuguese , 2007, EURASIP J. Adv. Signal Process..

[2]  Mohan S. Kankanhalli,et al.  Creating audio keywords for event detection in soccer video , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[3]  Wen-Huang Cheng,et al.  Semantic context detection based on hierarchical audio models , 2003, MIR '03.

[4]  Lie Lu,et al.  A flexible framework for key audio effects detection and auditory context inference , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Guodong Guo,et al.  Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[6]  Jane Yung-jen Hsu,et al.  A study of semantic context detection by using SVM and GMM approaches , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[7]  Svetha Venkatesh,et al.  Detecting indexical signs in film audio for scene interpretation , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..