Evaluating Interactive Dialogue Systems: Extending Component Evaluation to Integrated System Evaluation

This paper discusses the range of ways in which spoken dialogue system components have been evaluated and discusses approaches to evaluation that attempt to integrate component evaluation into an overall view of system performance. We will argue that the PARADISE (PARAdigm for DIalogue System Evaluation) framework has several advantages over other proposals.

[1]  Andrew C. Simpson,et al.  Black box and glass box evaluation of the SUNDIAL system , 1993, EUROSPEECH.

[2]  Michael K. Brown,et al.  Development Principles for Dialog-Based Interfaces , 1996, ECAI Workshop on Dialogue Processing in Spoken Language Systems.

[3]  Lewis M. Norton,et al.  Beyond Class A: A Proposal for Automatic Evaluation of Discourse , 1990, HLT.

[4]  Jon Doyle,et al.  Doyle See Infer Choose Do Perceive Act , 2009 .

[5]  Julia Galliers,et al.  Evaluating natural language processing systems , 1995 .

[6]  Morena Danieli,et al.  Metrics for Evaluating Dialogue Strategies in a Spoken Language System , 1996, ArXiv.

[7]  Elizabeth Shriberg,et al.  Human-Machine Problem Solving Using Spoken Language Systems (SLS): Factors Affecting Performance and User Satisfaction , 1992, HLT.

[8]  Marilyn A. Walker,et al.  The Effect of Resource Limits and Task Complexity on Collaborative Planning in Dialogue , 1995, Artif. Intell..

[9]  R. L. Keeney,et al.  Decisions with Multiple Objectives: Preferences and Value Trade-Offs , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  Marilyn A. Walker,et al.  Evaluating Discourse Processing Algorithms , 1989, ACL.

[11]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[12]  S.J.J. Smith,et al.  Empirical Methods for Artificial Intelligence , 1995 .

[13]  Lynette Hirschman,et al.  The cost of errors in a spoken language system , 1993, EUROSPEECH.

[14]  F. B. Vernadat,et al.  Decisions with Multiple Objectives: Preferences and Value Tradeoffs , 1994 .

[15]  Marilyn A. Walker,et al.  PARADISE: A Framework for Evaluating Spoken Dialogue Agents , 1997, ACL.

[16]  Elizabeth Shriberg,et al.  Subject-Based Evaluation Measures for Interactive Spoken Language Systems , 1992, HLT.

[17]  Sandra Carberry,et al.  Plan Recognition and Its Use in Understanding Dialog , 1989 .

[18]  C Kamm,et al.  User Interfaces for voice applications , 1994 .

[19]  Rebecca J. Passonneau,et al.  Discourse Segmentation by Human and Automated Means , 1997, CL.

[20]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[21]  Niels Ole Bernsen,et al.  Principles for the design of cooperative spoken human-machine dialogue , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[22]  Margaret King,et al.  Evaluating natural language processing systems , 1996, CACM.

[23]  Julia Hirschberg,et al.  A Prosodic Analysis of Discourse Segments in Direction-Giving Monologues , 1996, ACL.

[24]  Madeleine Bates,et al.  A Proposal for Incremental Dialogue Evaluation , 1991, HLT.

[25]  D. Richard Hipp,et al.  Spoken Natural Language Dialog Systems: A Practical Approach , 1994 .

[26]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[27]  Ronnie W. Smith,et al.  Effects of Variable Initiative on Linguistic Behavior in Human-Computer Spoken Natural Language Dialogue , 1997, Comput. Linguistics.

[28]  L SidnerCandace,et al.  Attention, intentions, and the structure of discourse , 1986 .

[29]  Victor Zue,et al.  Experiments in Evaluating Interactive Spoken Language Systems , 1992, HLT.

[30]  Alexander I. Rudnicky,et al.  Multi-Site Data Collection and Evaluation in Spoken Language Understanding , 1993, HLT.