Spoken language understanding

This article is intended to serve as an introduction to the field of statistical SLU, based on the mainstream statistical modeling approach that shares a similar mathematical framework with many other statistical pattern recognition applications such as speech recognition. In particular, we formulated a number of statistical models for SLU in the literature as extensions to HMMs as segment models, where a multiple-word block (segment) with word dependency is generated from each underlying Markov state corresponding to each individual semantic slot defined from the application domain. In the past, due partly to its nature of symbolic rather than numeric processing, the important field of SLU in human language technology has not been widely exposed to the signal processing research community. However, many key techniques in SLU originated from statistical signal processing. And because SLU is becoming increasingly important, as one major target application area of ASR that has been dear to many signal processing researchers, we contribute this article to provide a natural bridge between ASR and SLU in methodological and mathematical foundation. It is our hope that when the mathematical basis of SLU becomes well known through this introductory article, more powerful techniques established by signal processing researchers may further advance SLU to form a solid application area, making speech technology a successful component for intelligent human-machine communication.

[1]  Beth Ann Hockey,et al.  A baseline method for compiling typed unification grammars into context free language models , 2001, INTERSPEECH.

[2]  Victor Zue,et al.  GALAXY-II: a reference architecture for conversational system development , 1998, ICSLP.

[3]  W. A. Woods,et al.  Language processing for speech understanding , 1986 .

[4]  Wayne H. Ward,et al.  Recent Improvements in the CMU Spoken Language Understanding System , 1994, HLT.

[5]  Stephanie Seneff,et al.  TINA: A Natural Language System for Spoken Language Applications , 1992, Comput. Linguistics.

[6]  Alex Acero,et al.  Speech utterance classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[7]  Gregory A. Sanders,et al.  Darpa Communicator Evaluation: Progress from 2000 to 2001 Darpa Communicator Evaluation: Progress from 2000 to 2001 , 2022 .

[8]  Roberto Pieraccini,et al.  A Learning Approach to Natural Language Understanding , 1994, ArXiv.

[9]  Douglas E. Appelt,et al.  GEMINI: A Natural Language System for Spoken-Language Understanding , 1993, ACL.

[10]  Salim Roukos,et al.  Fertility Models for Statistical Natural Language Understanding , 1997, ACL.

[11]  Giuseppe Riccardi,et al.  Stochastic language models for speech recognition and understanding , 1998, ICSLP.

[12]  Frédéric Béchet,et al.  Conceptual decoding for spoken dialog systems , 2003, INTERSPEECH.

[13]  Marilyn A. Walker,et al.  MATCH: An Architecture for Multimodal Dialogue Systems , 2002, ACL.

[14]  Richard M. Schwartz,et al.  Hidden Understanding Models of Natural Language , 1994, ACL.

[15]  Bob Carpenter,et al.  Natural language call routing: a robust, self-organizing approach , 1998, ICSLP.

[16]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[17]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[18]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[19]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[20]  A. Pargellis,et al.  A comparison of four metrics for auto-inducing semantic classes , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[21]  P. J. Price,et al.  Evaluation of Spoken Language Systems: the ATIS Domain , 1990, HLT.

[22]  Kuansan Wang,et al.  Semantics synchronous understanding for robust spoken language applications , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[23]  Alex Acero,et al.  Combination of CFG and n-gram modeling in semantic grammar learning , 2003, INTERSPEECH.

[24]  Ye-Yi Wang,et al.  Creating speech recognition grammars from regular expressions for alphanumeric concepts , 2004, INTERSPEECH.

[25]  Alexander H. Waibel,et al.  Interactive Translation of Conversational Speech , 1996, Computer.

[26]  Dong Yu,et al.  Improved name recognition with user modeling , 2003, INTERSPEECH.

[27]  Gökhan Tür,et al.  Unsupervised and active learning in automatic speech recognition for call classification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28]  Steve J. Young,et al.  Hidden vector state model for hierarchical semantic parsing , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[29]  J.G. Wilpon,et al.  Intelligent virtual agents for contact center automation , 2005, IEEE Signal Processing Magazine.

[30]  Hermann Ney,et al.  Natural language understanding using statistical machine translation , 2001, INTERSPEECH.

[31]  Chin-Hui Lee,et al.  Discriminative training for call classification and routing , 2002, INTERSPEECH.

[32]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[33]  Li Deng,et al.  Mipad: a next generation PDA prototype , 2000, INTERSPEECH.

[34]  Helen Meng,et al.  Improvements on a semi-automatic grammar induction framework , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[35]  Ye-Yi Wang,et al.  Is word error rate a good indicator for spoken language understanding accuracy , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[36]  Robert C. Moore Using Natural-Language Knowledge Sources in Speech Recognition , 1999 .

[37]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[38]  Roberto Pieraccini,et al.  Stochastic automata for language modeling , 1996, Comput. Speech Lang..

[39]  Lin-shan Lee,et al.  Spoken document understanding and organization , 2005, IEEE Signal Processing Magazine.