Integrating multiple layers of concept information into n-gram modeling for spoken language understanding

The paper presents a novel approach, integrating multi-layer concept information into a trigram language model, to improve the understanding accuracy for spoken dialogue systems. With this approach, both the recognition accuracy and out-of-grammar problem can be largely improved. In an experiment using a real-world air-ticket information spoken dialogue system for Mandarin Chinese, a relative concept error rate reduction of 33% is achieved.

[1]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[2]  E. Giachin,et al.  Spoken Language Dialogue Systems , 1997 .

[3]  Stephanie Seneff,et al.  TINA: A Natural Language System for Spoken Language Applications , 1992, Comput. Linguistics.

[4]  Steven Abney,et al.  Parsing By Chunks , 1991 .

[5]  Andreas Kellner,et al.  PADIS - An automatic telephone switchboard and directory information system , 1997, Speech Communication.

[6]  David M. Goblirsch,et al.  Viterbi beam search with layered bigrams , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Tatsuya Kawahara,et al.  Domain-independent spoken dialogue platform using key-phrase spotting based on combined language model , 2001, INTERSPEECH.

[8]  Victor Zue,et al.  Conversational interfaces: advances and challenges , 1997, Proceedings of the IEEE.

[9]  Jia-Lin Shen,et al.  Integrating layer concept inform ation into n-gram modeling for spoken language understanding , 2004, INTERSPEECH.

[10]  Frédéric Béchet,et al.  A language model combining n-grams and stochastic finite state automata , 1999, EUROSPEECH.

[11]  Xuedong Huang,et al.  A unified context-free grammar and n-gram model for spoken language processing , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[12]  Hong-Kwang Jeff Kuo,et al.  Phrase-based language models for speech recognition , 1999, EUROSPEECH.

[13]  Géraldine Damnati,et al.  Deriving phrase-based language models , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[14]  James R. Glass,et al.  Real-time telephone-based speech recognition in the Jupiter domain , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[15]  Stephanie Seneff,et al.  Response planning and generation in the MERCURY flight reservation system , 2002, Comput. Speech Lang..

[16]  Kuansan Wang,et al.  Semantics synchronous understanding for robust spoken language applications , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[17]  Chiu-yu Tseng,et al.  MAT-2000 - design, collection, and validation of a Mandarin 2000-speaker telephone speech database , 2000, INTERSPEECH.

[18]  Steven Abney,et al.  Part-of-Speech Tagging and Partial Parsing , 1997 .

[19]  A. Kellner,et al.  A voice-controlled automatic telephone switchboard and directory information system , 1996, Proceedings of IVTTA '96. Workshop on Interactive Voice Technology for Telecommunications Applications.

[20]  Encarna Segarra,et al.  Language Understanding Using Two-Level Stochastic Models with POS and Semantic Units , 2001, TSD.