Robust MFCCs Derived from Differentiated Power Spectrum

The mel-scaled frequency cepstral coefficients (MFCCs) derived from Fourier transform and filter bank analysis are perhaps the most widely used front-ends in state-of-the-art speech recognition systems. One of the major issues with the MFCCs is that they are very sensitive to additive noise. To improve the robustness of speech front-ends with respect to noise, we introduce, in this paper, a new set of MFCC vector which is estimated through three steps. First, the power spectrum of speech signal is estimated through the fast Fourier transform (FFT). Then the power spectrum is differentiated with respected to frequency. Finally, the differentiated power spectrum is transformed into MFCC-like coefficients. Speech recognition experiments for various tasks indicate that the new feature vector is more robust than traditional mel-scaled frequency cepstral coefficients (MFCCs) in additive noise conditions.

[1]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[2]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[3]  J R Cohen,et al.  Application of an auditory model to speech recognition. , 1989, The Journal of the Acoustical Society of America.

[4]  Hans-Günter Hirsch,et al.  Improved speech recognition using high-pass filtering of subband envelopes , 1991, EUROSPEECH.

[5]  K. K. Paliwal Dimensionality reduction of the enhanced feature set for the HMM-based speech recognizer , 1992, Digit. Signal Process..

[6]  Reinhold Haeb-Umbach,et al.  Improvements in Speech Recognition for Voice Dialling in the Car Environment , 1992 .

[7]  Juan Arturo Nolazco-Flores,et al.  Continuous speech recognition in noise using spectral subtraction and HMM adaptation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Mark J. F. Gales,et al.  Improving environmental robustness in large vocabulary speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9]  Biing-Hwang Juang,et al.  Signal bias removal by maximum likelihood estimation for robust telephone speech recognition , 1996, IEEE Trans. Speech Audio Process..

[10]  Hervé Bourlard,et al.  A mew ASR approach based on independent processing and recombination of partial frequency bands , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  Richard M. Stern,et al.  A vector Taylor series approach for environment-independent speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[12]  Satoshi Takahashi,et al.  Jacobian approach to fast acoustic model adaptation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Saeed Vaseghi,et al.  Noise compensation methods for hidden Markov model speech recognition in adverse environments , 1997, IEEE Trans. Speech Audio Process..

[14]  Stephen J. Cox,et al.  Evaluating feature set performance using the f-ratio and j-measures , 1997, EUROSPEECH.

[15]  Dimitrie C. Popescu,et al.  Kalman filtering of colored noise for speech enhancement , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[16]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[17]  Jean-Claude Junqua,et al.  Environment-adaptive algorithms for robust speech recognition , 2001 .