Improve speech recognition in mobile environment by dynamical noise model adaptation

In this paper, we identify that the major source of error in mobile device based speech recognition is the long silence segment in the beginning and the end of the utterance, which is most noise sensitive. While these segments are most dynamic, they are often poorly modeled by HMM. To improve the silence modeling could improve the overall performance of the speech recognition systems. We propose in this paper to dynamically adapt the silence model on the utterance level. Using a multi-lingual database collected in car under a number of driving conditions, we show a significant improvement in speech recognition accuracy.

[1]  Li Deng,et al.  Large-vocabulary speech recognition under adverse acoustic environments , 2000, INTERSPEECH.

[2]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[3]  Chin-Hui Lee,et al.  A study on speaker adaptation of continuous density HMM parameters , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[4]  John J. Godfrey,et al.  Transforming HMMs for speaker-independent hands-free speech recognition in the car , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.