Assessment of dialogue systems by means of a new simulation technique

In recent years, a question of great interest has been the development of tools and techniqnes to facilitate the evaluation of dialogue systems. The latter can be evaluated from various points of view, such as recognition and understanding rates, dialogue naturalness and robustness against recognition errors. Evaluation usually requires compiling a large corpus of words and sentences uttered by users, relevant to the application domain the system is designed for. This paper proposes a new technique that makes it possible to reuse such a corpus for the evaluation and to check the performance of the system when different dialogue strategies are used. The technique is based on the automatic generation of conversations between the dialogue system, together with an additional dialogue system called user simulator that represents the user's interaction with the dialogue system. The technique has been applied to evaluate a dialogue system developed in our lab using two different recognition front-ends and two different dialogue strategies to handle user confirmations. The experiments show that the prompt-dependent recognition front-end achieves better results, but that this front-end is appropriate only if users limit their utterances to those related to the current system prompt. The prompt-independent front-end achieves inferior results, but enables front-end users to utter any permitted utterance at any time, irrespective of the system prompt. In consequence, this front-end may allow a more natural and comfortable interaction. The experiments also show that the re-prompting confirmation strategy enhances system performance for both recognition front-ends.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  Jan H. Hulstijn,et al.  Dialogue strategy redesign with reability measures , 1998 .

[3]  Hauke Schramm,et al.  Strategies for name recognition in automatic directory assistance systems , 2000, Speech Commun..

[4]  Mari Ostendorf,et al.  Variable n-grams and extensions for conversational speech language modeling , 2000, IEEE Trans. Speech Audio Process..

[5]  Yasuhisa Niimi,et al.  Mathematical analysis of dialogue control strategies , 1999, EUROSPEECH.

[6]  Victor Zue,et al.  JUPlTER: a telephone-based conversational interface for weather information , 2000, IEEE Trans. Speech Audio Process..

[7]  Mari Ostendorf,et al.  Prosodic and lexical indications of discourse structure in human-machine interactions , 1997, Speech Commun..

[8]  Lori Lamel,et al.  Design strategies for spoken language dialog systems , 1999, 6th European Conference on Speech Communication and Technology (Eurospeech 1999).

[9]  Morena Danieli,et al.  Metrics for Evaluating Dialogue Strategies in a Spoken Language System , 1996, ArXiv.

[10]  James F. Allen Natural language understanding (2nd ed.) , 1995 .

[11]  Hauke Schramm,et al.  The thoughtful elephant: strategies for spoken dialog systems , 2000, IEEE Trans. Speech Audio Process..

[12]  Guy Perennou,et al.  Confirmation strategies to improve correction rates in a telephonic inquiry dialogue system , 1999, EUROSPEECH.

[13]  Jean-Luc Gauvain,et al.  The LIMSI RailTel System: Field trial of a telephone service for rail travel information , 1997, Speech Commun..

[14]  Andreas Kellner,et al.  PADIS - An automatic telephone switchboard and directory information system , 1997, Speech Communication.

[15]  Elisabeth Maier,et al.  Dialogue Processing in Spoken Language Systems , 1996, Lecture Notes in Computer Science.

[16]  Ramón López-Cózar,et al.  Evaluation of a Dialogue System Based on a Generic Model that Combines Robust Speech Understanding and Mixed-initiative Control , 2000, LREC.

[17]  Roberto Pieraccini,et al.  A stochastic model of human-machine interaction for learning dialog strategies , 2000, IEEE Trans. Speech Audio Process..

[18]  José Carlos Segura Luna,et al.  A spoken dialogue system based on dialogue corpues analysis , 1998 .

[19]  James F. Allen Natural language understanding , 1987, Bejnamin/Cummings series in computer science.

[20]  Igor Schadle,et al.  Connectionist language models for speech understanding: the problem of word order variation , 1999, EUROSPEECH.

[21]  A. B.,et al.  SPEECH COMMUNICATION , 2001 .

[22]  Giuseppe Riccardi,et al.  How may I help you? , 1997, Speech Commun..

[23]  Manuela Boros,et al.  Linguistic phrase spotting in a simple application spoken dialogue system , 1999, EUROSPEECH.

[24]  Masahiro Araki,et al.  Automatic Evaluation Environment for Spoken Dialogue Systems , 1996, ECAI Workshop on Dialogue Processing in Spoken Language Systems.

[25]  Victor Zue,et al.  From interface to content: translingual access and delivery of on-line information , 1997, EUROSPEECH.

[26]  Thomas Niesler,et al.  The 1998 HTK system for transcription of conversational telephone speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[27]  Shrikanth S. Narayanan,et al.  VPQ: a spoken language interface to large scale directory information , 1998, ICSLP.

[28]  Ramón López-Cózar,et al.  A new word-confidence threshold technique to enhance the performance of spoken dialogue systems , 1999, EUROSPEECH.

[29]  Morena Danieli,et al.  Dialogos: a robust system for human-machine spoken dialogue on the telephone , 1996, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  Frédéric Béchet,et al.  A language model combining n-grams and stochastic finite state automata , 1999, EUROSPEECH.

[31]  Mark J. F. Gales,et al.  Broadcast news transcription using HTK , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[32]  Jean-Luc Gauvain,et al.  User evaluation of the MASK kiosk , 1998, Speech Commun..

[33]  Thomas Hain,et al.  The 1998 HTK broadcast news transcription system: development and results , 1999 .