Quantitative and Qualitative Evaluation of Darpa Communicator Spoken Dialogue Systems

This paper describes the application of the PARADISE evaluation framework to the corpus of 662 human-computer dialogues collected in the June 2000 Darpa Communicator data collection. We describe results based on the standard logfile metrics as well as results based on additional qualitative metrics derived using the DATE dialogue act tagging scheme. We show that performance models derived via using the standard metrics can account for 37% of the variance in user satisfaction, and that the addition of DATE metrics improved the models by an absolute 5%.

[1]  Marilyn A. Walker,et al.  DATE: A Dialogue Act Tagging Scheme for Evaluation of Spoken Dialogue Systems , 2001, HLT.

[2]  Norbert Reithinger,et al.  Utilizing Statistical Dialogue Act Processing in Verbrnobil , 1995, ACL.

[3]  Lori S. Levin,et al.  CLARITY: INFERRING DISCOURSE STRUCTURE FROM SPEECH , 2002 .

[4]  Gregory A. Sanders,et al.  DARPA communicator dialog travel planning systems: the june 2000 data collection , 2001, INTERSPEECH.

[5]  Marilyn A. Walker,et al.  Towards developing general models of usability with PARADISE , 2000, Natural Language Engineering.

[6]  Elizabeth Shriberg,et al.  Subject-Based Evaluation Measures for Interactive Spoken Language Systems , 1992, HLT.

[7]  Herbert H. Clark,et al.  Contributing to Discourse , 1989, Cogn. Sci..

[8]  Amy Isard,et al.  Transaction and Action Coding in the Map Task Corpus , 1995 .

[9]  E. Goffman Frame analysis: An essay on the organization of experience , 1974 .

[10]  Johanna D. Moore,et al.  An Empirical Investigation of Proposals in Collaborative Dialogues , 1998, ACL.

[11]  Johanna D. Moore,et al.  36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, COLING-ACL '98, August 10-14, 1998, Université de Montréal, Montréal, Quebec, Canada. Proceedings of the Conference. , 1998 .

[12]  Andreas Stolcke,et al.  Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech? , 1998, Language and speech.

[13]  Marilyn A. Walker,et al.  Redundancy in Collaborative Dialogue , 1992, COLING.

[14]  Pamela W. Jordan,et al.  Intentional influences on object redescriptions in dialogue: evidence from an empirical study , 2000 .