A one-stage decoder for interpretation of natural speech

Current speech understanding systems are typically designed as multistage systems, although this theoretically gives rise to errors due to early decisions. We present a framework that offers the chance of reducing these errors by an integrated system which directly computes a semantic tree representation from the input speech signal through a token passing based one-stage decoder, called ODINS. In order to limit the complexity of ODINS, we represent all a-priori knowledge consistently by a generalized uniform knowledge model based on a hierarchy of probabilistic transition networks, which also can be n-grams. Our framework includes a method to evaluate the system output using an edit distance based tree matching algorithm. First experiments quantify and confirm the theoretical advantage of the one-stage strategy over a corresponding two-stage approach.

[1]  Wayne H. Ward,et al.  A concept graph based confidence measure , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Xavier L. Aubert,et al.  One pass cross word decoding for large vocabularies based on a lexical tree search organization , 1999, EUROSPEECH.

[3]  Feng Zheng,et al.  Generalized hierarchical search in the ISIP ASR system , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).

[4]  Mehryar Mohri,et al.  A Rational Design for a Weighted Finite-State Transducer Library , 1997, Workshop on Implementing Automata.

[5]  Robert C. Moore Using Natural-Language Knowledge Sources in Speech Recognition , 1999 .

[6]  A. Waibel,et al.  A one-pass decoder based on polymorphic linguistic context assignment , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[7]  Roberto Pieraccini,et al.  Learning how to understand language , 1993, EUROSPEECH.

[8]  Kaizhong Zhang,et al.  Approximate tree pattern matching , 1997 .

[9]  Mei-Yuh Hwang,et al.  Microsoft Windows highly intelligent speech recognizer: Whisper , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[10]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[11]  Günther Görz,et al.  Towards understanding spontaneous speech: word accuracy vs. concept accuracy , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[12]  Jonathan G. Fiscus,et al.  Better alignment procedures for speech recognition evaluation , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[14]  Steve Young,et al.  Token passing: a simple conceptual model for connected speech recognition systems , 1989 .