A Noise-Robust Fft-Based Spectrum for Audio Classification

Recently, an early auditory model (K. Wang and S. Shamma, 1994) that calculates a so-called auditory spectrum, has been employed in audio classification where excellent performance is reported along with robustness in noisy environment. Unfortunately, this early auditory model is characterized by high computational requirements and the use of nonlinear processing. In this paper, inspired by the inherent self-normalization property of the early auditory model, we propose a simplified FFT-based spectrum which is noise-robust in audio classification. To evaluate the comparative performance of the proposed FFT-based spectrum, a three-class (i.e., speech, music and noise) audio classification task is carried out wherein a support vector machine (SVM) is employed as the classifier. Compared to a conventional FFT-based spectrum, both the original auditory spectrum and the proposed self-normalized FFT-based spectrum show more robust performance in noisy test cases. Test results also indicate that the performance of the self-normalized FFT-based spectrum is close to that of the original auditory spectrum, while its computational complexity is significantly lower

[1]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[2]  Lie Lu,et al.  Content analysis for audio classification and segmentation , 2002, IEEE Trans. Speech Audio Process..

[3]  David V. Anderson,et al.  Low-power audio classification for ubiquitous sensor networks , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Kuansan Wang,et al.  Self-normalization and noise-robustness in early auditory representations , 1994, IEEE Trans. Speech Audio Process..

[5]  Nima Mesgarani,et al.  Speech discrimination based on multiscale spectro-temporal modulations , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Georgios Tziritas,et al.  A speech/music discriminator based on RMS and zero-crossings , 2005, IEEE Transactions on Multimedia.

[7]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[8]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9]  Mounya Elhilali,et al.  A spectro-temporal modulation index (STMI) for assessment of speech intelligibility , 2003, Speech Commun..

[10]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[11]  Ying Li,et al.  SVM-based audio classification for instructional video analysis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.