Incorporating discourse features into confidence scoring of intention recognition results in spoken dialogue systems

The paper proposes a method for the confidence scoring of intention recognition results in spoken dialogue systems. To achieve tasks, a spoken dialogue system has to recognize user intentions. However, because of speech recognition errors and ambiguity in user utterances, it sometimes has difficulty recognizing them correctly. Confidence scoring allows errors to be detected in intention recognition results and has proved useful for dialogue management. Conventional methods use the features obtained from speech recognition results for single utterances for confidence scoring. However, this may be insufficient since the intention recognition result is a result of discourse processing. We propose incorporating discourse features for a more accurate confidence scoring of intention recognition results. Experimental results show that incorporating discourse features significantly improves the confidence scoring.

[1]  Daniel G. Bobrow,et al.  GUS, A Frame-Driven Dialog System , 1986, Artif. Intell..

[2]  Wayne H. Ward,et al.  A concept graph based confidence measure , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Candace L. Sidner,et al.  COLLAGEN: Applying Collaborative Discourse Theory to Human-Computer Interaction , 2001, AI Mag..

[4]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[5]  Kiyohiro Shikano,et al.  Julius - an open source real-time large vocabulary recognition engine , 2001, INTERSPEECH.

[6]  Mikio Nakano,et al.  Spoken dialogue understanding using an incremental speech understanding method , 2005 .

[7]  Mikio Nakano,et al.  Spoken dialogue understanding using an incremental speech understanding method , 2005, Systems and Computers in Japan.

[8]  Thomas Schaaf,et al.  Confidence measures for spontaneous speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Dilek Z. Hakkani-Tür,et al.  Detecting and extracting named entities from spontaneous speech in a mixed-initiative spoken dialogue context: How May I Help You?sm, tm , 2004, Speech Commun..

[10]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[11]  Siobhan Chapman Logic and Conversation , 2005 .

[12]  Masanobu Abe,et al.  A Japanese TTS system based on multiform units and a speech modification algorithm with harmonics reconstruction , 2001, IEEE Trans. Speech Audio Process..

[13]  Norihito Yasuda,et al.  Efficient spoken dialogue control depending on the speech recognition rate and system's database , 2003, INTERSPEECH.

[14]  Joseph Polifroni,et al.  A form-based dialogue manager for spoken language applications , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[15]  Mikio Nakano,et al.  Corpus-Based Discourse Understanding in Spoken Dialogue Systems , 2003, ACL.

[16]  Renato De Mori,et al.  The Application of Semantic Classification Trees to Natural Language Understanding , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Wayne H. Ward,et al.  Estimating semantic confidence for spoken dialogue systems , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Geoffrey Zweig,et al.  Information Extraction from Voicemail , 2001, ACL.

[19]  Nigel Ward,et al.  Can confidence scores help users post-editing speech recognizer output? , 2002, INTERSPEECH.

[20]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[21]  Stephanie Seneff,et al.  Response planning and generation in the MERCURY flight reservation system , 2002, Comput. Speech Lang..

[22]  James F. Allen,et al.  An architecture for more realistic conversational systems , 2001, IUI '01.

[23]  Julia Hirschberg,et al.  Prosodic and other cues to speech recognition failures , 2004, Speech Commun..

[24]  Maria T. Pazienza,et al.  Information Extraction , 2002, Lecture Notes in Computer Science.

[25]  Paolo Baggia,et al.  Partial parsing as a robust parsing strategy , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  Giorgio Satta,et al.  Computation of Probabilities for an Island-Driven Parser , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  S. Singh,et al.  Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[28]  Yi-Chung Lin,et al.  Probabilistic concept verification for language understanding in spoken dialogue systems , 2001, INTERSPEECH.

[29]  Hong-Kwang Jeff Kuo,et al.  Statistical recursive finite state machine parsing for speech understanding , 2000, INTERSPEECH.

[30]  M. J. D. Powell,et al.  An efficient method for finding the minimum of a function of several variables without calculating derivatives , 1964, Comput. J..

[31]  Karen Spärck Jones,et al.  Unconstrained keyword spotting using phone lattices with application to spoken document retrieval , 1997, Comput. Speech Lang..

[32]  Wayne H. Ward,et al.  The CU communicator: an architecture for dialogue systems , 2000, INTERSPEECH.

[33]  Sherif Abdou,et al.  Integrating multiple knowledge sources for improved speech understanding , 2001, INTERSPEECH.

[34]  Jennifer Chu-Carroll,et al.  MIMIC: An Adaptive Mixed Initiative Spoken Dialogue System for Information Queries , 2000, ANLP.

[35]  Joseph Polifroni,et al.  Recognition confidence scoring and its use in speech understanding systems , 2002, Comput. Speech Lang..

[36]  Stephanie Seneff,et al.  A context resolution server for the galaxy conversational systems , 2003, INTERSPEECH.

[37]  Hermann Ney,et al.  Natural language understanding using statistical machine translation , 2001, INTERSPEECH.

[38]  Katsuhito Sudoh,et al.  Incorporating Discourse Features into Confidence Scoring of Intention Recognition Results in Spoken Dialogue Systems , 2005, ICASSP.

[39]  Mikio Nakano,et al.  Evaluating discourse understanding in spoken dialogue systems , 2003, TSLP.

[40]  Tatsuya Kawahara,et al.  Flexible Mixed-Initiative Dialogue Management using Concept-Level Confidence Measures of Speech Recognizer Output , 2000, COLING.

[41]  Biing-Hwang Juang,et al.  Discriminative utterance verification for connected digits recognition , 1995, IEEE Trans. Speech Audio Process..

[42]  Stephanie Seneff Robust parsing for spoken language systems , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[43]  Eric Fosler-Lussier,et al.  Ambiguity representation and resolution in spoken dialogue systems , 2001, INTERSPEECH.