A computationally efficient speech/music discriminator for radio recordings

This paper presents a speech/music discriminator for radio recordings, based on a new and computationally efficient region growing technique, that bears its origins in the field of image segmentation. The proposed scheme operates on a single feature, a variant of the spectral entropy, which is extracted from the audio recording by means of a short-term processing technique. The proposed method has been tested on recordings from radio stations broadcasting over the Internet and, despite its simplicity, has proved to yield performance results comparable to more sophisticated approaches.

[1]  I. Miller Probability, Random Variables, and Stochastic Processes , 1966 .

[2]  Georgios Tziritas,et al.  A speech/music discriminator based on RMS and zero-crossings , 2005, IEEE Transactions on Multimedia.

[3]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  Hynek Hermansky,et al.  Spectral entropy based feature for robust ASR , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Sergios Theodoridis,et al.  Speech/music discrimination for radio broadcasts using a hybrid HMM-Bayesian Network architecture , 2006, 2006 14th European Signal Processing Conference.

[6]  H. Saunders,et al.  Probability, Random Variables and Stochastic Processes (2nd Edition) , 1989 .

[7]  Douglas Eck,et al.  Frame-Level Speech/Music Discrimination using AdaBoost , 2005 .

[8]  Hervé Bourlard,et al.  Speech/music segmentation using entropy and dynamism features in a HMM classification framework , 2003, Speech Commun..

[9]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[10]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Nicolás Ruiz-Reyes,et al.  Speech/Music Discrimination Using a Single Warped LPC-Based Feature , 2005, ISMIR.