Automatic classification of speech and music using neural networks

The importance of automatic discrimination between speech signals and music signals has evolved as a research topic over recent years. The need to classify audio into categories such as speech or music is an important aspect of many multimedia document retrieval systems. Several approaches have been previously used to discriminate between speech and music data. In this paper, we propose the use of the mean and variance of the discrete wavelet transform in addition to other features that have been used previously for audio classification. We have used Multi-Layer Perceptron (MLP) Neural Networks as a classifier. Our initial tests have shown encouraging results that indicate the viability of our approach.

[1]  Liang Gu,et al.  Robust singing detection in speech/music discriminator design , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2]  Georgios Tziritas,et al.  A speech/music discriminator based on RMS and zero-crossings , 2005, IEEE Transactions on Multimedia.

[3]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[4]  Michael J. Carey,et al.  Feature fusion for music detection , 1999, EUROSPEECH.

[5]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[6]  Guy de Collongue,et al.  Speech/Music/Silence and Gender Detection Algorithm , 2001 .

[7]  Michael J. Carey,et al.  A comparison of features for speech, music discrimination , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[8]  Peter Kabal,et al.  Speech/music discrimination for multimedia applications , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[9]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Wen Gao,et al.  A fast and robust speech/music discrimination approach , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[11]  Liming Chen,et al.  Robust speech music discrimination using spectrum's first order statistics and neural networks , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[12]  Julien Pinquier,et al.  A fusion study in speech / music classification , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[13]  Julien Pinquier,et al.  Speech and music classification in audio documents , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  E.M. Saad,et al.  A multifeature speech/music discrimination system , 2002, Proceedings of the Nineteenth National Radio Science Conference.

[15]  Julien Pinquier,et al.  Robust speech / music classification in audio documents , 2002, INTERSPEECH.

[16]  Stefan Karnebäck Discrimination between speech and music based on a low frequency modulation feature , 2001, INTERSPEECH.