Dynamic search-space pruning for time-constrained speech recognition

In automatic speech recognition complex state spaces are searched during the recognition process. By limiting these search spaces the computation time can be reduced, but unfortunately the recognition rate mostly decreases, too. However, especially for time-critical recognition tasks a search-space pruning is necessary. Therefore, we developed a dynamic mechanism to optimize the pruning parameters for time-constrained recognition tasks, e.g. speech recognition for robotic systems, in respect to word accuracy and computation time. With this mechanism an automatic speech recognition system can process speech signals with an approximately constant processing rate. Compared to a system without such a dynamic mechanism and the same time available for computation, the variance of the processing rate is decreased greatly without a significant loss of word accuracy. Furthermore, the extended system can be sped up to real-time processing, if desired or necessary.

[1]  Franz Kummert,et al.  Forward masking for increased robustness in automatic speech recognition , 2001, INTERSPEECH.

[2]  Franz Kummert,et al.  Incremental speech recognition for multimodal interfaces , 1998, IECON '98. Proceedings of the 24th Annual Conference of the IEEE Industrial Electronics Society (Cat. No.98CH36200).

[3]  Jian Wu,et al.  Reducing time-synchronous beam search effort using stage based look-ahead and language model rank based pruning , 2000, INTERSPEECH.

[4]  Hermann Ney,et al.  The Philips Research system for continuous-speech recognition , 1992 .

[5]  Hermann Ney,et al.  Look-ahead techniques for fast beam search , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Hermann Ney,et al.  Improvements in beam search , 1994, ICSLP.

[7]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[8]  Bruce T. Lowerre,et al.  The HARPY speech recognition system , 1976 .

[9]  Gernot A. Fink Developing HMM-Based Recognizers with ESMERALDA , 1999, TSD.

[10]  Thomas Niesler,et al.  The 1998 HTK system for transcription of conversational telephone speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[11]  Hermann Ney,et al.  Dynamic programming search for continuous speech recognition , 1999, IEEE Signal Process. Mag..

[12]  Matthias Pätzold,et al.  Handbuch zur Datenaufnahme und Transliteration in TP14 von Verbmobil - 3.0 , 1994 .