Discrimination of speech and monophonic singing in continuous audio streams applying multi-layer support vector machines

We present a novel approach to the discrimination of speech and monophonic singing for use in music information retrieval applications. A working prototype is introduced, applying multi-layer support vector machines for the discrimination and static high-level features derived from the pitch and energy contours of an acoustic signal. The feature set for discrimination is presented and ranked according to a linear discriminant analysis. For the automatic segmentation within an input signal stream, a further feature set is used for the discrimination of signal and noise. A corpus for training and evaluation comprising speech and monophonic singing data of nine performers is described in detail. The data has been labeled according to the judgment of another set of probands. A recognition rate of correct assignments of 99.2% could be reached, and demonstrates the high performance of the proposed methods.

[1]  Qi Tian,et al.  Musical genre classification using support vector machines , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[2]  Björn W. Schuller,et al.  Hidden Markov model-based speech emotion recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[3]  David Gerhard Pitch-based acoustic feature analysis for the discrimination of speech and monophonic singing , 2002 .

[4]  Björn W. Schuller,et al.  A hybrid music retrieval system using belief networks to integrate multimodal queries and contextual knowledge , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[5]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .