A harmonic-model-based front end for robust speech recognition

Speech recognition accuracy degrades significantly when the speech has been corrupted by noise, especially when the system has been trained on clean speech. Many compensation algorithms have been developed which require reliable online noise estimates or a priori knowledge of the noise. In situations where such estimates or knowledge is difficult to obtain, these methods fail. We present a new robustness algorithm which avoids these problems by making no assumptions about the corrupting noise. Instead, we exploit properties inherent to the speech signal itself to denoise the recognition features. In this method, speech is decomposed into harmonic and noise-like components, which are then processed independently and recombined. By processing noise-corrupted speech in this manner we achieve significant improvements in recognition accuracy on the Aurora 2 task.

[1]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[2]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[3]  Liang Gu,et al.  Perceptual harmonic cepstral coefficients for speech recognition in noisy environment , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4]  Alex Acero,et al.  Maximum a posteriori pitch tracking , 1998, ICSLP.

[5]  Christophe d'Alessandro,et al.  An iterative algorithm for decomposition of speech signals into periodic and aperiodic components , 1998, IEEE Trans. Speech Audio Process..

[6]  David Pearce,et al.  Harmonic tunnelling: tracking non-stationary noises during speech , 2001, INTERSPEECH.

[7]  Yannis Stylianou,et al.  HNM: a simple, efficient harmonic+noise model for speech , 1993, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[8]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..