Adaptive Feature Selection for Speech / Music Classification

In this paper, we propose a new system for classifying audio segments as speech or music. The proposed system improves classification accuracy, particularly in low signal-to-noise ratio (SNR) environments. The system selects the features with the highest classification accuracy that corresponds to the SNR value. The value of this features are compared to certain thresholds, which are also adapted to the SNR. Multi-expert method of combining the features to improve classification accuracy is implemented. A new feature, termed the variance of low-band energy ratio, is also introduced. This feature produces large improvements in classification accuracy at low SNR. Performance of the proposed system is evaluated for different SNR using a library of speech and music audio segments. Using one-second segments it is shown that the proposed system can enhance the classification accuracy by 22% at SNR=-15 dB, and obtain classification accuracy of 90.3% at SNR=0 dB

[1]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Peter Kabal,et al.  Speech/music discrimination for multimedia applications , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  Eliathamby Ambikairajah,et al.  Audio indexing using feature warping and fusion techniques , 2004, IEEE 6th Workshop on Multimedia Signal Processing, 2004..

[4]  J. F. Martin,et al.  Adaptive method for SNR estimation in speech signal , 1996 .

[5]  Mohan S. Kankanhalli,et al.  Unsupervised classification of music genre using hidden Markov model , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[6]  Julien Pinquier,et al.  A fusion study in speech / music classification , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[7]  Björn W. Schuller,et al.  Feature Selection and Stacking for Robust Discrimination of Speech, Monophonic Singing, and Polyphonic Music , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[8]  D. Black The theory of committees and elections , 1959 .

[9]  Georgios Tziritas,et al.  A speech/music discriminator based on RMS and zero-crossings , 2005, IEEE Transactions on Multimedia.

[10]  Michael J. Carey,et al.  A comparison of features for speech, music discrimination , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[11]  R.A. Goubran,et al.  Security-Monitoring using Microphone Arrays and Audio Classification , 2005, 2005 IEEE Instrumentationand Measurement Technology Conference Proceedings.

[12]  Rafik A. Goubran,et al.  Adaptive pitch-based speech detection for hands-free applications , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[13]  Rafik A. Goubran,et al.  SNR estimation of speech signals using subbands and fourth-order statistics , 1999, IEEE Signal Processing Letters.