PARADISE: A Framework for Evaluating Spoken Dialogue Agents

This paper presents PARADISE (PARAdigm for DIalogue System Evaluation), a general framework for evaluating spoken dialogue agents. The framework decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enables the calculation of performance over subdialogues and whole dialogues, specifies the relative contribution of various factors to performance, and makes it possible to compare agents performing different tasks by normalizing for task complexity.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  J. A. Faulkner Paul , 1928 .

[3]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[4]  W. Stewart Church , 1962, Encyclopedic Dictionary of Archaeology.

[5]  Barbara J. Grosz,et al.  The representation and use of focus in dialogue understanding. , 1977 .

[6]  R. L. Keeney,et al.  Decisions with Multiple Objectives: Preferences and Value Trade-Offs , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[7]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[8]  Julia Hirschberg,et al.  User Participation in the Reasoning Processes of Expert Systems , 1982, AAAI.

[9]  Bonnie L. Webber,et al.  Taking the Initiative in Natural Language Data Base Interactions: Justifying Why , 1982, COLING.

[10]  Bonnie L. Webber,et al.  Preventing False Inferences , 1984, ACL.

[11]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[12]  Wolfgang Wahlster The role of natural language in advanced knowledge-based systems , 1986 .

[13]  Sandra Carberry,et al.  Plan Recognition and Its Use in Understanding Dialog , 1989 .

[14]  Lewis M. Norton,et al.  Beyond Class A: A Proposal for Automatic Evaluation of Discourse , 1990, HLT.

[15]  Madeleine Bates,et al.  A Proposal for Incremental Dialogue Evaluation , 1991, HLT.

[16]  David Yarowsky,et al.  Estimating Upper and Lower Bounds on the Performance of Word-Sense Disambiguation Programs , 1992, ACL.

[17]  Victor Zue,et al.  Experiments in Evaluating Interactive Spoken Language Systems , 1992, HLT.

[18]  Jon Doyle,et al.  Doyle See Infer Choose Do Perceive Act , 2009 .

[19]  Elizabeth Shriberg,et al.  Human-Machine Problem Solving Using Spoken Language Systems (SLS): Factors Affecting Performance and User Satisfaction , 1992, HLT.

[20]  Lynette Hirschman,et al.  The cost of errors in a spoken language system , 1993, EUROSPEECH.

[21]  Andrew C. Simpson,et al.  Black box and glass box evaluation of the SUNDIAL system , 1993, EUROSPEECH.

[22]  F. B. Vernadat,et al.  Decisions with Multiple Objectives: Preferences and Value Tradeoffs , 1994 .

[23]  Julia Galliers,et al.  Evaluating natural language processing systems , 1995 .

[24]  C Kamm,et al.  User Interfaces for voice applications , 1994 .

[25]  Johanna D. Moore,et al.  Investigating Cue Selection and Placement in Tutorial Discourse , 1995, ACL.

[26]  S.J.J. Smith,et al.  Empirical Methods for Artificial Intelligence , 1995 .

[27]  Jennifer Chu-Carroll,et al.  Response Generation in Collaborative Negotiation , 1995, ACL.

[28]  Marilyn A. Walker,et al.  The Effect of Resource Limits and Task Complexity on Collaborative Planning in Dialogue , 1995, Artif. Intell..

[29]  Julia Hirschberg,et al.  A Prosodic Analysis of Discourse Segments in Direction-Giving Monologues , 1996, ACL.

[30]  Michael K. Brown,et al.  Development Principles for Dialog-Based Interfaces , 1996, ECAI Workshop on Dialogue Processing in Spoken Language Systems.

[31]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[32]  Margaret King,et al.  Evaluating natural language processing systems , 1996, CACM.

[33]  Morena Danieli,et al.  Metrics for Evaluating Dialogue Strategies in a Spoken Language System , 1996, ArXiv.

[34]  Rebecca J. Passonneau,et al.  Empirical Analysis of Three Dimensions of Spoken Discourse: Segmentation, Coherence, and Linguistic Devices , 1996 .

[35]  Rebecca J. Passonneau,et al.  Discourse Segmentation by Human and Automated Means , 1997, CL.

[36]  Ronnie W. Smith,et al.  Effects of Variable Initiative on Linguistic Behavior in Human-Computer Spoken Natural Language Dialogue , 1997, Comput. Linguistics.