A framework for robust MFCC feature extraction using SNR-dependent compression of enhanced mel filter bank energies

The Mel-frequency cepstral coefficients (MFCC) are most widely used and successful features for speech recognition. But, their performance degrades in presence of additive noise. In this paper, we propose a noise compensation method for Mel filter bank energies and so MFCC features. This compensation method includes two steps: Mel sub-band spectral subtraction and then compression of Mel-Sub-band energies. In the compression step, we propose a sub-band SNR-dependent compression function. We use this function instead of logarithm function in conventional MFCC feature extraction in presence of additive noise. Experimental results show that the proposed method significantly improves MFCC features performance in noisy conditions where it decreases word error rate about 70% in SNR value of 0 dB for different types of additive noise. Index Terms: Mel sub-bands, spectral subtraction, SNRdependent compression, MFCC

[1]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[2]  Satoshi Nakamura,et al.  Maximum likelihood sub-band adaptation for robust speech recognition , 2005, Speech Commun..

[3]  Eric H. C. Choi A generalized framework for compensation of mel-filterbank outputs in feature extraction for robust ASR , 2005, INTERSPEECH.

[4]  Hsiao-Chuan Wang,et al.  On the use of weighted filter bank analysis for the derivation of robust MFCCs , 2001, IEEE Signal Processing Letters.

[5]  Kuldip K. Paliwal,et al.  Product of power spectrum and group delay function for speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Satoshi Nakamura,et al.  Sub-band based additive noise removal for robust speech recognition , 2001, INTERSPEECH.

[7]  Patrice Alexandre,et al.  Root cepstral analysis: A unified view. Application to speech processing in car noise environments , 1993, Speech Commun..

[8]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[9]  Ahmad Akbari,et al.  Sub-band weighted projection measure for robust sub-band speech recognition , 2005, INTERSPEECH.

[10]  Hervé Bourlard,et al.  Phase autocorrelation (PAC) derived robust speech features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[11]  Shu Hung Leung,et al.  SNR-dependent non-uniform spectral compression for noisy speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Tom E. Bishop,et al.  Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[13]  Hsiao-Chuan Wang,et al.  On the use of weighted filter bank analysis for the derivation of robust MFCCs , 2001, IEEE Signal Process. Lett..

[14]  Yung-Hwan Oh,et al.  On the use of channel-attentive MFCC for robust recognition of partially corrupted speech , 2004, IEEE Signal Process. Lett..