Empirically Evaluating an Adaptable Spoken Dialogue System

Recent technological advances have made it possible to build real-time, interactive spoken dialogue systems for a wide variety of applications. However, when users do not respect the limitations of such systems, performance typically degrades. Although users differ with respect to their knowledge of system limitations, and although different dialogue strategies make system limitations more apparent to users, most current systems do not try to improve performance by adapting dialogue behavior to individual users. This paper presents an empirical evaluation of TOOT, an adaptable spoken dialogue system for retrieving train schedules on the web. We conduct an experiment in which 20 users carry out 4 tasks with both adaptable and non-adaptable versions of TOOT, resulting in a corpus of 80 dialogues. The values for a wide range of evaluation measures are then extracted from this corpus. Our results show that adaptable TOOT generally outperforms non-adaptable TOOT, and that the utility of adaptation depends on TOOT’s initial dialogue strategies.

[1]  Peter R. Monge,et al.  Multivariate techniques in human communication research , 1980 .

[2]  E. Russell Ritenour,et al.  Evaluating spoken dialog systems for telecommunication services , 1997, EUROSPEECH.

[3]  Ronnie W. Smith,et al.  An evaluation of strategies for selectively verifying utterance meanings in spoken natural language dialog , 1998, Int. J. Hum. Comput. Stud..

[4]  Marilyn A. Walker,et al.  From novice to expert: the effect of tutorials on user expertise with spoken dialogue systems , 1998, ICSLP.

[5]  Roberto Pieraccini,et al.  A stochastic model of computer-human interaction for learning dialogue strategies , 1997, EUROSPEECH.

[6]  John Anderson,et al.  Pragmatic User Modelling in a Commercial Software System , 1997 .

[7]  S.J.J. Smith,et al.  Empirical Methods for Artificial Intelligence , 1995 .

[8]  Marilyn A. Walker,et al.  Evaluating competing agent strategies for a voice email agent , 1997, EUROSPEECH.

[9]  Marilyn A. Walker,et al.  Automatic Detection of Poor Speech Recognition at the Dialogue Level , 1999, ACL.

[10]  G. Veldhuijzen van Zanten Adaptive mixed-initiative dialogue management , 1998 .

[11]  Marilyn A. Walker,et al.  PARADISE: A Framework for Evaluating Spoken Dialogue Agents , 1997, ACL.

[12]  Marilyn A. Walker,et al.  Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email , 1998, COLING-ACL.

[13]  Morena Danieli,et al.  Metrics for Evaluating Dialogue Strategies in a Spoken Language System , 1996, ArXiv.

[14]  Marilyn A. Walker,et al.  Evaluating Response Strategies in a Web-Based Spoken Dialogue Agent , 1998, ACL.