A multi-stage approach for Thai spoken language understanding

This article investigates a novel multi-stage approach for spoken language understanding (SLU), with an application to a pioneering Thai spoken dialogue system in a hotel reservation domain. Given an input word string, the system determines a goal and concept-values by three-stage processing; concept extraction, goal identification, and concept-value recognition. The concept extraction utilizes weighted finite state transducers (WFST) to extract concepts from the word string. Given the extracted concepts, a goal of the utterance is identified using a pattern classifier. Within a particular goal, the necessary concept-values are recognized from the WFST outputs produced in the concept extraction stage. A new logical N-gram model, which strategically combines the conventional N-gram parser with a regular grammar, is evaluated for concept extraction and concept-value recognition. Several classifiers are optimized and compared for goal identification. An advantage of the proposed SLU model is that it can be trained by a partially annotated corpus, where only the relevant keywords and the goal of each training utterance are required. Although the proposed model is evaluated only on the Thai hotel reservation system, the SLU itself is general and it is expected to be applicable for other languages once training data is available.

[1]  Sadaoki Furui,et al.  Combination of finite state automata and neural network for spoken language understanding , 2003, INTERSPEECH.

[2]  Hong-Kwang Jeff Kuo,et al.  Statistical recursive finite state machine parsing for speech understanding , 2000, INTERSPEECH.

[3]  Victor Zue,et al.  JUPlTER: a telephone-based conversational interface for weather information , 2000, IEEE Trans. Speech Audio Process..

[4]  Wayne H. Ward,et al.  Dialog-context dependent language modeling combining n-grams and stochastic context-free grammars , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[6]  Dilek Z. Hakkani-Tür,et al.  Named entity extraction from spontaneous speech in how may i help you? , 2002, INTERSPEECH.

[7]  Sadaoki Furui,et al.  Belief-based nonlinear rescoring in Thai speech understanding , 2004, INTERSPEECH.

[8]  Philip N. Garner,et al.  A keyword selection strategy for dialogue move recognition and multi-class topic identification , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Virach Sornlertlamvanich,et al.  Issues in Thai Text-to-Speech Synthesis: The NECTEC Approach 1 , 2000 .

[10]  Brendan J. Frey,et al.  Combination of statistical and rule-based approaches for spoken language understanding , 2002, INTERSPEECH.

[12]  Lori Lamel,et al.  Design strategies for spoken language dialog systems , 1999, 6th European Conference on Speech Communication and Technology (Eurospeech 1999).

[13]  Wolfgang Minker,et al.  A stochastic case frame approach for natural language understanding , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[14]  Stephanie Seneff,et al.  TINA: A Natural Language System for Spoken Language Applications , 1992, Comput. Linguistics.

[15]  Sadaoki Furui,et al.  Confidence scoring for ANN-based spoken language understanding , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[16]  Sadaoki Furui,et al.  Pioneering a Thai Language Spoken Dialogue System , 2003 .

[17]  Sadaoki Furui,et al.  Finite-state transducer based modeling of morphosyntax with applications to Hungarian LVCSR , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[18]  Steve Young,et al.  A data-driven spoken language understanding system , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[19]  Giuseppe Riccardi,et al.  How may I help you? , 1997, Speech Commun..

[20]  Xuedong Huang,et al.  A unified context-free grammar and n-gram model for spoken language processing , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[21]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[22]  Richard M. Schwartz,et al.  Hidden Understanding Models of Natural Language , 1994, ACL.