Histogram Equalization-Based Features for Speech, Music, and Song Discrimination

In this letter, we present a new class of segment-based features for speech, music and song discrimination. These features, called PHEQ (Polynomial-Fit Histogram Equalization), are derived from the nonlinear relationship between the short-term feature distributions computed at segment level and a reference distribution. Results show that PHEQ characteristics outperform short-term features such as Mel Frequency Cepstrum Coefficients (MFCC) and conventional segment-based ones such as MFCC mean and variance. Furthermore, the combination of short-term and PHEQ features significantly improves the performance of the whole system.

[1]  Hervé Bourlard,et al.  Speech/music segmentation using entropy and dynamism features in a HMM classification framework , 2003, Speech Commun..

[2]  Gaël Richard,et al.  Vocal detection in music with support vector machines , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Peter Kabal,et al.  Speech/music discrimination for multimedia applications , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  Sergios Theodoridis,et al.  A Speech/Music Discriminator for Radio Recordings Using Bayesian Networks , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  Javier Ramírez,et al.  Cepstral domain segmental nonlinear feature transformations for robust speech recognition , 2004, IEEE Signal Processing Letters.

[6]  Björn W. Schuller,et al.  Feature Selection and Stacking for Robust Discrimination of Speech, Monophonic Singing, and Polyphonic Music , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[7]  Berlin Chen,et al.  Exploiting polynomial-fit histogram equalization and temporal average for robust speech recognition , 2006, INTERSPEECH.

[8]  Stephen A. McGuire,et al.  Introductory Statistics , 2007, Technometrics.

[9]  Gernot A. Fink,et al.  Combining acoustic and articulatory feature information for robust speech recognition , 2002, Speech Commun..

[10]  Andrey Temko,et al.  Classification of acoustic events using SVM-based clustering schemes , 2006, Pattern Recognit..

[11]  Sergios Theodoridis,et al.  A Speech/Music Discriminator of Radio Recordings Based on Dynamic Programming and Bayesian Networks , 2008, IEEE Transactions on Multimedia.