Using knowledge on word-islands to improve the performance of spoken dialogue systems

This paper proposes a technique to improve the performance of spoken dialogue systems that not only consider knowledge about the semantic frames used by systems to understand the spoken language but also employ knowledge about the words in the system application domain that are used to complete frame slots. Using both knowledge sources, the technique considers specific word sequences to form what we call word-islands and which are then employed to create the language models and dictionary used by the system's speech recogniser. Word-islands are easier to recognise than the words comprising the islands, which leads to improved spoken language understanding and system performance. Experiments have been conducted using two spoken dialogue systems which had previously been developed in our lab: one to provide fast food information and the other to provide bus travel information. Results show that the proposed technique improves the performance of both systems by improving speech recognition and spoken language understanding of sentence types that are difficult to process.

[1]  Jin H. Kim,et al.  On-line cursive script recognition using an island-driven search technique , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[2]  Tara N. Sainath Island-driven search using broad phonetic classes , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[3]  Günther Görz,et al.  Towards understanding spontaneous speech: word accuracy vs. concept accuracy , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  Morena Danieli,et al.  Metrics for Evaluating Dialogue Strategies in a Spoken Language System , 1996, ArXiv.

[5]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[6]  Michael Picheny,et al.  Using semantic analysis to improve speech recognition performance , 2005, Comput. Speech Lang..

[7]  T. Kawabata,et al.  Island-driven continuous speech recognizer using phone-based HMM word spotting , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[8]  Staffan Larsson,et al.  Comparing System-Driven and Free Dialogue in In-Vehicle Interaction , 2011, INTERSPEECH.

[9]  Hua Ai,et al.  User Simulation as Testing for Spoken Dialog Systems , 2008, SIGDIAL Workshop.

[10]  Satoshi Nakamura,et al.  User Study of Spoken Decision Support System , 2011, INTERSPEECH.

[11]  Renato De Mori,et al.  Multiple resolution analysis for robust automatic speech recognition , 2006, Comput. Speech Lang..

[12]  Ramón López-Cózar,et al.  Assessment of dialogue systems by means of a new simulation technique , 2003, Speech Commun..

[13]  Oliver Lemon,et al.  A Multithreaded Conversational Interface for Pedestrian Navigation and Question Answering , 2013, SIGDIAL Conference.

[14]  Grace Chung,et al.  Developing a Flexible Spoken Dialog System Using Simulation , 2004, ACL.

[15]  Joakim Gustafson,et al.  Walk This Way: Spatial Grounding for City Exploration , 2014, Natural Interaction with Robots, Knowbots and Smartphones, Putting Spoken Dialog Systems into Practice.

[16]  Thomas Niesler,et al.  The 1998 HTK system for transcription of conversational telephone speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[17]  Eric Fosler-Lussier,et al.  Using semantic class information for rapid development of language models within ASR dialogue systems , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[18]  José Carlos Segura Luna,et al.  A spoken dialogue system based on dialogue corpues analysis , 1998 .

[19]  Wayne H. Ward,et al.  A class based language model for speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[20]  Gabriel Skantze,et al.  Exploring the effects of gaze and pauses in situated human-robot interaction , 2013, SIGDIAL Conference.

[21]  Ramón López-Cózar,et al.  Two-level speech recognition to enhance the performance of spoken dialogue systems , 2006, Knowl. Based Syst..

[22]  Joelle Pineau,et al.  Spoken Dialogue Management Using Probabilistic Reasoning , 2000, ACL.

[23]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[24]  Michael F. McTear,et al.  Spoken Dialogue Technology , 2004, Springer London.

[25]  Hermann Ney,et al.  Matching training and test data distributions for robust speech recognition , 2003, Speech Commun..

[26]  Samy Bengio,et al.  Robust speech recognition and feature extraction using HMM2 , 2003, Comput. Speech Lang..

[27]  Ramón López-Cózar,et al.  Testing the performance of spoken dialogue systems by means of an artificially simulated user , 2006, Artificial Intelligence Review.

[28]  Hua Ai,et al.  Comparing User Simulation Models For Dialog Strategy Learning , 2007, HLT-NAACL.

[29]  Giuseppe Riccardi,et al.  Integration of utterance verification with statistical language modeling and spoken language understanding , 2001, Speech Commun..

[30]  Masahiro Araki,et al.  Spoken, Multilingual and Multimodal Dialogue Systems: Development and Assessment , 2005 .

[31]  Sebastian Mller,et al.  Quality of Telephone-Based Spoken Dialogue Systems , 2004 .

[32]  James F. Allen Natural language understanding (2nd ed.) , 1995 .

[33]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[34]  Shrikanth S. Narayanan,et al.  Continuous speech recognition using attention shift decoding with soft decision , 2009, INTERSPEECH.

[35]  José B. Mariño,et al.  Albayzin speech database: design of the phonetic corpus , 1993, EUROSPEECH.

[36]  Gökhan Tür,et al.  Beyond ASR 1-best: Using word confusion networks in spoken language understanding , 2006, Comput. Speech Lang..

[37]  Kee-Eung Kim,et al.  Robust Performance Evaluation of POMDP-Based Dialogue Systems , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[38]  Marilyn A. Walker,et al.  An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email , 2000, J. Artif. Intell. Res..

[39]  Michael F. McTear,et al.  Book Review , 2005, Computational Linguistics.

[40]  James F. Allen Natural language understanding , 1987, Bejnamin/Cummings series in computer science.

[41]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[42]  James M. McQueen,et al.  Eight questions about spoken-word recognition , 2007 .

[43]  Ye-Yi Wang,et al.  Is word error rate a good indicator for spoken language understanding accuracy , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[44]  Wolfgang Minker,et al.  Application and Evaluation of a Conditioned Hidden Markov Model for Estimating Interaction Quality of Spoken Dialogue Systems , 2014, Natural Interaction with Robots, Knowbots and Smartphones, Putting Spoken Dialog Systems into Practice.

[45]  Giorgio Satta,et al.  Stochastic Context-Free Grammars for Island-Driven Probabilistic Parsing , 1991, IWPT.

[46]  Kallirroi Georgila,et al.  An ISU Dialogue System Exhibiting Reinforcement Learning of Dialogue Policies: Generic Slot-Filling in the TALK In-car System , 2006, EACL.

[47]  Jeff A. Bilmes,et al.  Attention shift decoding for conversational speech recognition , 2007, INTERSPEECH.

[48]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[49]  Benoît Maison,et al.  Toward island-of-reliability-driven very-large-vocabulary on-line handwriting recognition using character confidence scoring , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[50]  Ramón López-Cózar,et al.  Evaluation of a Dialogue System Based on a Generic Model that Combines Robust Speech Understanding and Mixed-initiative Control , 2000, LREC.

[51]  Kallirroi Georgila,et al.  User simulation for spoken dialogue systems: learning and evaluation , 2006, INTERSPEECH.

[52]  Ramón López-Cózar,et al.  A new technique based on augmented language models to improve the performance of spoken dialogue systems , 2001, INTERSPEECH.

[53]  Tim Paek,et al.  Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths and Weaknesses for Practical Deployment , 2006 .

[54]  Ramón López-Cózar,et al.  Using knowledge of misunderstandings to increase the robustness of spoken dialogue systems , 2010, Knowl. Based Syst..

[55]  Sebastian Möller,et al.  Quality of Telephone-Based Spoken Dialogue Systems , 2005 .