论文信息 - Empirical Methods for Evaluating Dialog Systems

Empirical Methods for Evaluating Dialog Systems

We examine what purpose a dialog metric serves and then propose empirical methods for evaluating systems that meet that purpose. The methods include a protocol for conducting a wizard-of-oz experiment and a basic set of descriptive statistics for substantiating performance claims using the data collected from the experiment as an ideal benchmark or "gold standard" for comparative judgments. The methods also provide a practical means of optimizing the system through component analysis and cost valuation.

Tim Paek

[1] Marilyn A. Walker,et al. PARADISE: A Framework for Evaluating Spoken Dialogue Agents , 1997, ACL.

[2] H. H. Clark,et al. Collaborating on contributions to conversations , 1987 .

[3] Eric Horvitz,et al. Conversation as Action Under Uncertainty , 2000, UAI.

[4] Niels Ole Bernsen,et al. Designing interactive speech systems - from first ideas to user testing , 1998 .

[5] Victor Zue,et al. Data collection and performance evaluation of spoken dialogue systems: the MIT experience , 2000, INTERSPEECH.

[6] Shimei Pan,et al. Empirically Evaluating an Adaptable Spoken Dialogue System , 1999, ArXiv.

[7] Morena Danieli,et al. Metrics for Evaluating Dialogue Strategies in a Spoken Language System , 1996, ArXiv.

[8] Eric Horvitz. Uncertainty, Utility, and Understanding , 2000, Intelligent Tutoring Systems.

[9] Eric Horvitz,et al. Uncertainty, Utility, and Misunderstanding: A Decision-Theoretic Perspective on Grounding in Conversational Systems , 1999 .

[10] Jean-Luc Gauvain,et al. Considerations in the design and evaluation of spoken language dialog systems , 2000, INTERSPEECH.

[11] Eric Horvitz,et al. A computational architecture for conversation , 1999 .

[12] Gordon Miller,et al. Decision Making: Descriptive, Normative, and Prescriptive Interactions , 1990 .

[13] Herbert H. Clark,et al. Contributing to Discourse , 1989, Cogn. Sci..

[14] Herbert H. Clark,et al. Grounding in communication , 1991, Perspectives on socially shared cognition.

[15] Marilyn A. Walker,et al. Evaluating spoken dialogue agents with PARADISE: Two case studies , 1998, Comput. Speech Lang..