Why speech recognizers make errors ? a robustness view

The performance of large vocabulary speech recognizers often varies depending on the input speech and the quality of the trained models. The particular attributes that cause recognition errors are a research area that has not been well studied. This paper addresses this issue from a robustness perspective using a large amount of field data collected from natural language dialog services. In particular, we present a method for tracking time-varying or nonstationary extraneous events, such as music, background noise, etc. We show that this measure is a better predictor of recognition errors than a standard measure of stationary signal-to-noise ratio (SNR). Combining the two measures provides a data selection algorithm for detecting problematic speech.

[1]  Qiru Zhou,et al.  Robust endpoint detection and energy normalization for real-time speech and speaker recognition , 2002, IEEE Trans. Speech Audio Process..

[2]  Andrej Ljolje Multiple task-domain acoustic models , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  Jean-Claude Junqua,et al.  Robustness in Automatic Speech Recognition , 1996 .

[4]  Izhak Shafran,et al.  Robust speech detection and segmentation for real-time ASR applications , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5]  Hong Kook Kim,et al.  Cepstrum-domain acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments , 2001, IEEE Trans. Speech Audio Process..

[6]  I. Boyd,et al.  The voice activity detector for the Pan-European digital cellular mobile telephone service , 1988, International Conference on Acoustics, Speech, and Signal Processing,.

[7]  Li Deng,et al.  Recursive estimation of nonstationary noise using iterative stochastic approximation for robust speech recognition , 2003, IEEE Trans. Speech Audio Process..

[8]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..