论文信息 - Survey on evaluation methods for dialogue systems

Survey on evaluation methods for dialogue systems

In this paper, we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation, in and of itself, is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost- and time-intensive. Thus, much work has been put into finding methods which allow a reduction in involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented, conversational, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then present the evaluation methods regarding that class.

[1] 康焱. Cambridge University , 1900, Nature.

[2] J. A. Adams,et al. Psychological bulletin. , 1962, Psychological bulletin.

[3] J. Austin. How to do things with words , 1962 .

[4] J. O. Urmson,et al. The William James Lectures , 1963 .

[5] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[6] Joseph Weizenbaum,et al. ELIZA—a computer program for the study of natural language communication between man and machine , 1966, CACM.

[7] A. Koller,et al. Speech Acts: An Essay in the Philosophy of Language , 1969 .

[8] John R. Searle,et al. Speech Acts: An Essay in the Philosophy of Language , 1970 .

[9] J. Fleiss. Measuring nominal scale agreement among many raters. , 1971 .

[10] John R. Searle,et al. Expression and Meaning: Indirect speech acts , 1979 .

[11] Kenneth Mark Colby,et al. Clinical artificial intelligence , 1981, Behavioral and Brain Sciences.

[12] Joseph Weizenbaum,et al. and Machine , 1977 .

[13] John Fox,et al. The Knowledge Engineering Review , 1984, The Knowledge Engineering Review.

[14] M. V. Rossum,et al. In Neural Computation , 2022 .

[15] Lewis M. Norton,et al. Beyond Class A: A Proposal for Automatic Evaluation of Discourse , 1990, HLT.

[16] A. M. Turing,et al. Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[17] Geoffrey Leech,et al. 100 Million Words of English:The British National Corpus (BNC) , 1992 .

[18] John J. Godfrey,et al. SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19] G. Leech. 100 million words of English , 1993, English Today.

[20] Branimir Boguraev,et al. Natural Language Engineering , 1995 .

[21] Jean Carletta,et al. Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[22] H. Silfverhielm,et al. Sweden , 1996, The Lancet.

[23] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[24] Marilyn A. Walker,et al. PARADISE: A Framework for Evaluating Spoken Dialogue Agents , 1997, ACL.

[25] S. Crawford,et al. Volume 1 , 2012, Journal of Diabetes Investigation.

[26] Roberto Pieraccini,et al. Using Markov decision process for learning dialogue strategies , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[27] David Traum,et al. Speech Acts for Dialogue Agents , 1999 .

[28] G. Carpenter,et al. Behavioral and Brain Sciences , 1999 .

[29] Ronald A. Cole,et al. TOOLS FOR RESEARCH AND EDUCATION IN SPEECH SCIENCE , 1999 .

[30] Marilyn A. Walker,et al. Reinforcement Learning for Spoken Dialogue Systems , 1999, NIPS.

[31] Marilyn A. Walker,et al. Towards developing general models of usability with PARADISE , 2000, Natural Language Engineering.

[32] Virginia Reviewer-Teller,et al. Review of Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin. Prentice Hall 2000. , 2000 .

[33] Lori Lamel,et al. The LIMSI ARISE system , 2000, Speech Commun..

[34] James H. Martin,et al. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[35] No Value,et al. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , 2000 .

[36] Marilyn A. Walker,et al. Natural Language Generation in Dialog Systems , 2001, HLT.

[37] Alexander I. Rudnicky,et al. N-best speech hypotheses reordering using linear regression , 2001, INTERSPEECH.

[38] A. B.,et al. SPEECH COMMUNICATION , 2001 .

[39] Nancy Green,et al. A Constraint-Based Approach for Cooperative Information-Seeking Dialogue , 2002, INLG.

[40] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[41] Mark Steedman,et al. Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[42] Editors , 2003 .

[43] Marilyn A. Walker,et al. Trainable Sentence Planning for Complex Information Presentations in Spoken Dialog Systems , 2004, ACL.

[44] Sebastian Möller,et al. INSPIRE: Evaluation of a Smart-Home System for Infotainment Management and Device Control , 2004, LREC.

[45] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[46] D. Scott. Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics , 2004 .

[47] Michael F. McTear,et al. Handling errors and determining confirmation strategies - An object-based approach , 2003, Speech Commun..

[48] Ye-Yi Wang,et al. Spoken language understanding , 2005, IEEE Signal Processing Magazine.

[49] C. Sidner,et al. Knowledge and Reasoning in Practical Dialogue Systems , 2005 .

[50] J. Schatztnann,et al. Effects of the user model on simulation-based learning of dialogue strategies , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[51] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.

[52] Luc De Raedt,et al. Proceedings of the 22nd international conference on Machine learning , 2005 .

[53] Tim Paek,et al. Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths and Weaknesses for Practical Deployment , 2006 .

[54] Sebastian Möller,et al. Memo: towards automatic usability evaluation of spoken dialogue services by user error simulations , 2006, INTERSPEECH.

[55] Steve J. Young,et al. A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies , 2006, The Knowledge Engineering Review.

[56] Jimmy J. Lin,et al. Overview of the TREC 2006 ciQA task , 2007, SIGF.

[57] Rieks op den Akker,et al. Handling speech input in the ritel QA dialogue system , 2007, INTERSPEECH.

[58] Hui Ye,et al. The Hidden Information State Approach to Dialog Management , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[59] Hui Ye,et al. Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.

[60] David Suendermann-Oeft,et al. Caller Experience: A method for evaluating dialog systems and its automatic prediction , 2008, 2008 IEEE Spoken Language Technology Workshop.

[61] Ellen M. Voorhees,et al. Evaluating Question Answering System Performance , 2008 .

[62] Gary Geunbae Lee,et al. Example-based dialog modeling for practical multi-domain dialog system , 2009, Speech Commun..

[63] Jean Scholtz,et al. Questionnaires for eliciting evaluation data from users of interactive question answering systems , 2009, Natural Language Engineering.

[64] Ron Artstein,et al. An Integrated Authoring Tool for Tactical Questioning Dialogue Systems , 2009 .

[65] Maxine Eskénazi,et al. The Spoken Dialogue Challenge , 2009, SIGDIAL Conference.

[66] Alon Lavie,et al. The Meteor metric for automatic evaluation of machine translation , 2009, Machine Translation.

[67] Oliver Lemon,et al. Does this list contain what you were searching for? Learning adaptive dialogue strategies for interactive question answering , 2009, Natural Language Engineering.

[68] Christine L. Lisetti,et al. Proceedings of the 13th International Conference on Human Computer Interaction , 2009 .

[69] Albert A. Rizzo,et al. Human Computer Interaction in Virtual Standardized Patient Systems , 2009, HCI.

[70] Sebastian Möller,et al. Modeling User Satisfaction with Hidden Markov Models , 2009, SIGDIAL Conference.