Reliable utterance segment recognition by integrating a grammar with statistical language constraints

This paper proposes a novel approach to the recognition of complete utterances and partial segments of utterances. This approach ensures a high level of confidence in the results. The proposed method is based on the cooperative use of a conventional n-gram constraint and additional grammatical constraints which take deviations from the grammar into account with a multi-pass search strategy. The partial utterance segments are obtained with high confidence as the segments that satisfy both n-gram and grammatical constraints. For improved efficiency, the context-free grammar expressing the grammatical constraints is approximated by a finite-state automaton. We consider all kinds of deviations from the grammar such as insertions, deletions and substitutions when applying the grammatical constraints. As a result, we can achieve a more robust application of grammatical constraints compared to a conventional word-skipping robust parser that can only handle one type of deviation, that is, insertions. Our experiments confirm that the proposed method can recognize partial segments of utterances more reliably than conventional continuous speech recognition methods using only n-grams. In addition, our results indicate that allowing more deviations from the grammatical constraints leads to better performance than the conventional word-skipping robust parser approach.

[1]  Wayne H. Ward Understanding spontaneous speech: the Phoenix system , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Jan Robin Rohlicek,et al.  Statistical language modeling combining N-gram and context-free grammars , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Tsuyoshi Morimoto,et al.  Dialogue speech recognition using syntactic rules based on subtrees and preterminal bigrams , 1997, Systems and Computers in Japan.

[4]  Wayne Ward Understanding Spontaneous Speech , 1989, HLT.

[5]  Roberto Pieraccini,et al.  A Learning Approach to Natural Language Understanding , 1994, ArXiv.

[6]  Hitoshi Iida,et al.  A speech and language database for speech translation research , 1994, ICSLP.

[7]  Gareth J. F. Jones,et al.  An integrated grammar/bigram language model using path scores , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[8]  Yumi Wakita,et al.  Correct parts extraction from speech recognition results using semantic distance calculation, and its application to speech translation , 1997, Workshop On Spoken Language Translation.

[9]  Yoshinori Sagisaka,et al.  Variable-order N-gram generation by word-class splitting and consecutive word grouping , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[10]  Atsushi Nakamura,et al.  Japanese speech databases for robust speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  Alon Lavie,et al.  Glr*: a robust grammar-focused parser for spontaneously spoken language , 1996 .

[12]  Emmanuel Roche,et al.  Finite-State Language Processing , 1997 .

[13]  Richard Sproat,et al.  A spoken language translator for restricted-domain context-free languages , 1992, Speech Commun..

[14]  Yoshinori Sagisaka,et al.  Spontaneous dialogue speech recognition using cross-word context constrained word graphs , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[15]  Yves Schabes,et al.  Finite-State Approximation of Phrase-Structure Grammars , 1997 .

[16]  Richard M. Schwartz,et al.  Hidden understanding models for statistical sentence understanding , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Victor Zue,et al.  Language modelling for recognition and understanding using layered bigrams , 1992, ICSLP.