PARADISE: A Framework for Evaluating Spoken Dialogue Agents

This paper presents PARADISE (PARAdigm for DIalogue System Evaluation), a general framework for evaluating spoken dialogue agents. The framework decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enables the calculation of performance over subdialogues and whole dialogues, specifies the relative contribution of various factors to performance, and makes it possible to compare agents performing different tasks by normalizing for task complexity.

[1]  F. B. Vernadat,et al.  Decisions with Multiple Objectives: Preferences and Value Tradeoffs , 1994 .

[2]  R. L. Keeney,et al.  Decisions with Multiple Objectives: Preferences and Value Trade-Offs , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[3]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[4]  Marilyn A. Walker,et al.  The Effect of Resource Limits and Task Complexity on Collaborative Planning in Dialogue , 1995, Artif. Intell..

[5]  J. A. Faulkner Paul , 1928 .

[6]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[7]  Julia Galliers,et al.  Evaluating natural language processing systems , 1995 .

[8]  C Kamm,et al.  User interfaces for voice applications. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Julia Hirschberg,et al.  A Prosodic Analysis of Discourse Segments in Direction-Giving Monologues , 1996, ACL.

[10]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[11]  Michael K. Brown,et al.  Development Principles for Dialog-Based Interfaces , 1996, ECAI Workshop on Dialogue Processing in Spoken Language Systems.

[12]  David Yarowsky,et al.  Estimating Upper and Lower Bounds on the Performance of Word-Sense Disambiguation Programs , 1992, ACL.

[13]  Victor Zue,et al.  Experiments in Evaluating Interactive Spoken Language Systems , 1992, HLT.

[14]  Johanna D. Moore,et al.  Investigating Cue Selection and Placement in Tutorial Discourse , 1995, ACL.

[15]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[16]  Bonnie L. Webber,et al.  Preventing False Inferences , 1984, ACL.

[17]  Julia Hirschberg,et al.  User Participation in the Reasoning Processes of Expert Systems , 1982, AAAI.

[18]  S.J.J. Smith,et al.  Empirical Methods for Artificial Intelligence , 1995 .

[19]  Rebecca J. Passonneau,et al.  Discourse Segmentation by Human and Automated Means , 1997, CL.

[20]  Jon Doyle,et al.  Doyle See Infer Choose Do Perceive Act , 2009 .

[21]  Margaret King,et al.  Evaluating natural language processing systems , 1996, CACM.

[22]  Lynette Hirschman,et al.  The cost of errors in a spoken language system , 1993, EUROSPEECH.

[23]  Lewis M. Norton,et al.  Beyond Class A: A Proposal for Automatic Evaluation of Discourse , 1990, HLT.

[24]  Barbara J. Grosz,et al.  The representation and use of focus in dialogue understanding. , 1977 .

[25]  Morena Danieli,et al.  Metrics for Evaluating Dialogue Strategies in a Spoken Language System , 1996, ArXiv.

[26]  Bonnie L. Webber,et al.  Taking the Initiative in Natural Language Data Base Interactions: Justifying Why , 1982, COLING.

[27]  Wolfgang Wahlster The role of natural language in advanced knowledge-based systems , 1986 .

[28]  Ronnie W. Smith,et al.  Effects of Variable Initiative on Linguistic Behavior in Human-Computer Spoken Natural Language Dialogue , 1997, Comput. Linguistics.

[29]  Jennifer Chu-Carroll,et al.  Response Generation in Collaborative Negotiation , 1995, ACL.

[30]  Madeleine Bates,et al.  A Proposal for Incremental Dialogue Evaluation , 1991, HLT.

[31]  Sandra Carberry,et al.  Plan Recognition and Its Use in Understanding Dialog , 1989 .

[32]  Andrew C. Simpson,et al.  Black box and glass box evaluation of the SUNDIAL system , 1993, EUROSPEECH.

[33]  Elizabeth Shriberg,et al.  Human-Machine Problem Solving Using Spoken Language Systems (SLS): Factors Affecting Performance and User Satisfaction , 1992, HLT.

[34]  Rebecca J. Passonneau,et al.  Empirical Analysis of Three Dimensions of Spoken Discourse: Segmentation, Coherence, and Linguistic Devices , 1996 .