Improved noise robustness of word HMMs based on weighted variance expansion for noisy speech recognition

In real-life environments such as factories, noise spectra and signal-to-noise ratio can vary dramatically due to the occurrence of nonstationary heavy noise and changes in the surrounding environment. Since the performance of current speech recognition technology noticeably deteriorates when the type of noise and signal-to-noise ratio at recognition time differ from those observed in training or during adaptation, HMMs that are robust to variation in noise are needed. In order to improve the robustness of existing word HMMs to noise, in this paper we propose a method to prevent the output distributions at states that are susceptible to noise from varying significantly on the basis of differences within the observation vectors by performing weighted variance expansions for the states and distributions according to their respective power values. We evaluate the effectiveness of the proposed method on a speaker-independent word recognition task with a vocabulary of 50 words and two types of factory noise, using both clean speech and noisy speech HMMs in recognition settings with different noise conditions (i.e., type of additive noise and signal-to-noise ratio) to those observed in training. Our experimental results show that for all types of HMM considered, recognition rates when there are variations in the noise conditions improved over a wide range of signal-to-noise ratios, confirming that for small-vocabulary noisy speech recognition the robustness of HMMs to noise has been improved by our method. In particular, we note a significant improvement in recognition performance using weighted variance expansions rather than a constant expansion when the signal-to-noise ratio is lower than that observed in training. © 2005 Wiley Periodicals, Inc. Syst Comp Jpn, 36(13): 57–68, 2005; Published online in Wiley InterScience (). DOI 10.1002sscj.20290

[1]  Jeih-Weih Hung,et al.  Improved robustness for speech recognition under noisy conditions using correlated parallel model combination , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2]  Saeed Vaseghi,et al.  Noise compensation methods for hidden Markov model speech recognition in adverse environments , 1997, IEEE Trans. Speech Audio Process..

[3]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[4]  Kiyohiro Shikano,et al.  Recognition of noisy speech by composition of hidden Markov models , 1993, EUROSPEECH.

[5]  Mark J. F. Gales,et al.  Cepstral parameter compensation for HMM recognition in noise , 1993, Speech Commun..

[6]  Yifan Gong,et al.  Speech recognition in noisy environments: A survey , 1995, Speech Commun..

[7]  Jean-Luc Gauvain,et al.  Model compensation for noises in training and test data , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Hsiao-Chuan Wang,et al.  Weighted parallel model combination for noisy speech recognition , 1998, ICSLP.

[9]  Biing-Hwang Juang,et al.  Speech recognition in adverse environments , 1991 .