Using knowledge of misunderstandings to increase the robustness of spoken dialogue systems

This paper proposes a new technique to increase the robustness of spoken dialogue systems employing an automatic procedure that aims to correct frames incorrectly generated by the system's component that deals with spoken language understanding. To do this the technique carries out a training that takes into account knowledge of previous system misunderstandings. The correction is transparent for the user as he is not aware of some mistakes made by the speech recogniser and thus interaction with the system can proceed more naturally. Experiments have been carried out using two spoken dialogue systems previously developed in our lab: Saplen and Viajero, which employ prompt-dependent and prompt-independent language models for speech recognition. The results obtained from 10,000 simulated dialogues show that the technique improves the performance of the two systems for both kinds of language modelling, especially for the prompt-independent language model. Using this type of model the Saplen system increases sentence understanding by 19.54%, task completion by 26.25%, word accuracy by 7.53%, and implicit recovery of speech recognition errors by 20.3%, whereas for the Viajero system these figures increase by 14.93%, 18.06%, 6.98% and 15.63%, respectively.

[1]  Victor Zue,et al.  Multilingual spoken-language understanding in the MIT Voyager system , 1995, Speech Commun..

[2]  Michael F. McTear,et al.  Book Review , 2005, Computational Linguistics.

[3]  Lori Lamel,et al.  The LIMSI ARISE system , 2000, Speech Commun..

[4]  Thomas Niesler,et al.  The 1998 HTK system for transcription of conversational telephone speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5]  Joelle Pineau,et al.  Fast reinforcement learning of dialog strategies , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  Stephanie Seneff,et al.  Dialogue Management in the Mercury Flight Reservation System , 2000 .

[7]  Ramón López-Cózar,et al.  Two-level speech recognition to enhance the performance of spoken dialogue systems , 2006, Knowl. Based Syst..

[8]  Joelle Pineau,et al.  Spoken Dialogue Management Using Probabilistic Reasoning , 2000, ACL.

[9]  Ahmad Emami Improving a connectionist based syntactical language model , 2003, INTERSPEECH.

[10]  Rong Zhang,et al.  Word level confidence annotation using combinations of features , 2001, INTERSPEECH.

[11]  Masahiro Araki,et al.  Spoken, Multilingual and Multimodal Dialogue Systems: Development and Assessment , 2005 .

[12]  Ramón López-Cózar,et al.  Assessment of dialogue systems by means of a new simulation technique , 2003, Speech Commun..

[13]  Milica Gasic,et al.  The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management , 2010, Comput. Speech Lang..

[14]  Roberto Pieraccini,et al.  Using Markov decision process for learning dialogue strategies , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[15]  Joseph Polifroni,et al.  Recognition confidence scoring and its use in speech understanding systems , 2002, Comput. Speech Lang..

[16]  Gökhan Tür,et al.  Beyond ASR 1-best: Using word confusion networks in spoken language understanding , 2006, Comput. Speech Lang..

[17]  Ramón López-Cózar,et al.  Evaluation of a Dialogue System Based on a Generic Model that Combines Robust Speech Understanding and Mixed-initiative Control , 2000, LREC.

[18]  Encarna Segarra,et al.  Development of a stochastic dialog manager driven by semantics , 2003, INTERSPEECH.

[19]  Oliver Lemon,et al.  multithreaded context for robust conversational interfaces: Context-sensitive speech recognition and interpretation of corrective fragments , 2004, TCHI.

[20]  Michael F. McTear,et al.  Spoken Dialogue Technology , 2004, Springer London.

[21]  Laurent Karsenty,et al.  Transparency strategies to help users handle system errors , 2005, Speech Commun..

[22]  Ramón López-Cózar,et al.  A Comparison between Dialog Corpora Acquired with Real and Simulated Users , 2009, SIGDIAL Conference.

[23]  Encarna Segarra,et al.  Learning of stochastic dialog models through a dialog simulation technique , 2005, INTERSPEECH.

[24]  Ramón López-Cózar,et al.  A new technique based on augmented language models to improve the performance of spoken dialogue systems , 2001, INTERSPEECH.

[25]  Chin-Hui Lee,et al.  Hierarchical class n-gram language models: towards better estimation of unseen events in speech recognition , 2003, INTERSPEECH.

[26]  Joakim Gustafson,et al.  Robust spoken language understanding in a computer game , 2006, Speech Commun..

[27]  Morena Danieli,et al.  Metrics for Evaluating Dialogue Strategies in a Spoken Language System , 1996, ArXiv.

[28]  Hui Jiang,et al.  Confidence measures for speech recognition: A survey , 2005, Speech Commun..

[29]  Michael Picheny,et al.  Using semantic analysis to improve speech recognition performance , 2005, Comput. Speech Lang..

[30]  Sebastian Mller,et al.  Quality of Telephone-Based Spoken Dialogue Systems , 2004 .

[31]  James F. Allen Natural language understanding (2nd ed.) , 1995 .

[32]  S. Singh,et al.  Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[33]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[34]  Olivier Galibert,et al.  Ritel: an open-domain, human-computer dialog system , 2005, INTERSPEECH.

[35]  Ramón López-Cózar,et al.  A voice activated dialogue system for fast-food restaurant applications , 1997, EUROSPEECH.

[36]  Morena Danieli,et al.  On the use of expectations for detecting and repairing human-machine miscommunication , 1997, AAAI 1996.

[37]  Renato De Mori,et al.  Multiple resolution analysis for robust automatic speech recognition , 2006, Comput. Speech Lang..

[38]  Ramón López-Cózar,et al.  Combining language models in the input interface of a spoken dialogue system , 2006, Comput. Speech Lang..

[39]  Victor Zue,et al.  PEGASUS: A spoken dialogue interface for on-line air travel planning , 1994, Speech Communication.

[40]  Katsuhito Sudoh,et al.  Incorporating discourse features into confidence scoring of intention recognition results in spoken dialogue systems , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[41]  Hermann Ney,et al.  Matching training and test data distributions for robust speech recognition , 2003, Speech Commun..

[42]  Samy Bengio,et al.  Robust speech recognition and feature extraction using HMM2 , 2003, Comput. Speech Lang..

[43]  Ramón López-Cózar,et al.  Implementing Modular Dialogue Systems: A Case of Study , 2005 .

[44]  Ye-Yi Wang,et al.  Is word error rate a good indicator for spoken language understanding accuracy , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[45]  Andrew Sears,et al.  Using confidence scores to improve hands-free speech based navigation in continuous dictation systems , 2004, TCHI.

[46]  Ramón López-Cózar,et al.  A new method for testing dialogue systems based on simulations of real-world conditions , 2002, INTERSPEECH.

[47]  Giuseppe Riccardi,et al.  Integration of utterance verification with statistical language modeling and spoken language understanding , 2001, Speech Commun..

[48]  Victor Zue,et al.  JUPlTER: a telephone-based conversational interface for weather information , 2000, IEEE Trans. Speech Audio Process..

[49]  José Carlos Segura Luna,et al.  A spoken dialogue system based on dialogue corpues analysis , 1998 .

[50]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[51]  James F. Allen Natural language understanding , 1987, Bejnamin/Cummings series in computer science.

[52]  Roberto Pieraccini,et al.  A stochastic model of computer-human interaction for learning dialogue strategies , 1997, EUROSPEECH.