Integration of parsing and incremental speech recognition

In this paper we propose a new approach to integrating a parser into a statistical speech recognizer. The method is able to incrementally apply grammatical restrictions and robustly combine them with the statistical acoustic and language models. On spontaneous speech data a 11.6% reduction in word error rate could be achieved compared to the baseline system applying statistical models only.

[1]  Finn Dag Buø,et al.  JANUS 93: towards spontaneous speech translation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Keiko Horiguchi,et al.  Towards Spontaneous Speech Translation , 1994 .

[3]  Günther Görz,et al.  Towards understanding spontaneous speech: word accuracy vs. concept accuracy , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  Jan Robin Rohlicek,et al.  Statistical language modeling combining N-gram and context-free grammars , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  A. Hauensteiny,et al.  An Investigation of Tightly Coupled Time Synchronous Speech Language Interfaces Using a Uniication Grammar , 1994 .

[6]  Gareth J. F. Jones,et al.  A robust language model incorporating a substring parser and extended n-grams , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Victor W. Zue,et al.  Integrating probabilistic LR parsing into speech understanding systems , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Thomas Kuhn,et al.  Die Erkennungsphase in einem Dialogsystem , 1995, DISKI.

[9]  Atsuhiko Kai,et al.  A frame-synchronous continuous speech recognition algorithm using a top-down parsing of context-free grammar , 1992, ICSLP.

[10]  Wayne H. Ward,et al.  CMLPs robust spoken language understanding system , 1993, EUROSPEECH.

[11]  Kenji Kita,et al.  Incorporating LR parsing into SPHINX , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[12]  David Goodine,et al.  Full integration of speech and language understanding in the MIT spoken language system , 1991, EUROSPEECH.

[13]  Franz Kummert,et al.  Incremental speech recognition for multimodal interfaces , 1998, IECON '98. Proceedings of the 24th Annual Conference of the IEEE Industrial Electronics Society (Cat. No.98CH36200).

[14]  David Goddeau,et al.  Using probabilistic shift-reduce parsing in speech recognition systems , 1992, ICSLP.

[15]  Heinrich Niemann,et al.  Combining stochastic and linguistic language models for recognition of spontaneous speech , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.