Using AR HMM state-dependent filtering for speech enhancement

In this paper we address the problem of enhancing speech which has been degraded by additive noise. As proposed by Ephraim et al. (1989), autoregressive hidden Markov models (AR-HMM) for the clean speech and an autoregressive Gaussian for the noise are used. The filter applied to a given frame of noisy speech is estimated using the noise model and the autoregressive Gaussian having the highest a posteriori probability given the decoded state sequence. The success of this technique is highly dependent on accurate estimation of the best state sequence. A new strategy combining the use of cepstral-based HMMs, autoregressive HMMs, and a model combination technique, is proposed. The intelligibility of the enhanced speech is indirectly assessed via speech recognition, by comparing performance on noisy speech with compensated models to performance on the enhanced speech with clean-speech models. The results on enhanced speech are as good as our best results obtained with noise compensated models.

[1]  Richard M. Stern,et al.  Environmental robustness in automatic speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[2]  Anthony J. Robinson,et al.  Enhancement and recognition of noisy speech within an autoregressive hidden Markov model framework using noise estimates from the noisy signal , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Mark J. F. Gales,et al.  Robust speech recognition in additive and convolutional noise using parallel model combination , 1995, Comput. Speech Lang..

[4]  Jean-Luc Gauvain,et al.  Developments in continuous speech dictation using the 1995 ARPA NAB news task , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  Alan V. Oppenheim,et al.  All-pole modeling of degraded speech , 1978 .

[6]  Biing-Hwang Juang,et al.  On the application of hidden Markov models for enhancing noisy speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[7]  Jean-Luc Gauvain,et al.  Development of spoken language corpora for travel information , 1995, EUROSPEECH.

[8]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..

[9]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[10]  Jean-Luc Gauvain,et al.  Model compensation for noises in training and test data , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.