Survey on evaluation methods for dialogue systems

In this paper, we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation, in and of itself, is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost- and time-intensive. Thus, much work has been put into finding methods which allow a reduction in involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented, conversational, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then present the evaluation methods regarding that class.

[1]  康焱 Cambridge University , 1900, Nature.

[2]  J. A. Adams,et al.  Psychological bulletin. , 1962, Psychological bulletin.

[3]  J. Austin How to do things with words , 1962 .

[4]  J. O. Urmson,et al.  The William James Lectures , 1963 .

[5]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[6]  Joseph Weizenbaum,et al.  ELIZA—a computer program for the study of natural language communication between man and machine , 1966, CACM.

[7]  A. Koller,et al.  Speech Acts: An Essay in the Philosophy of Language , 1969 .

[8]  John R. Searle,et al.  Speech Acts: An Essay in the Philosophy of Language , 1970 .

[9]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[10]  John R. Searle,et al.  Expression and Meaning: Indirect speech acts , 1979 .

[11]  Kenneth Mark Colby,et al.  Clinical artificial intelligence , 1981, Behavioral and Brain Sciences.

[12]  Joseph Weizenbaum,et al.  and Machine , 1977 .

[13]  John Fox,et al.  The Knowledge Engineering Review , 1984, The Knowledge Engineering Review.

[14]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[15]  Lewis M. Norton,et al.  Beyond Class A: A Proposal for Automatic Evaluation of Discourse , 1990, HLT.

[16]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[17]  Geoffrey Leech,et al.  100 Million Words of English:The British National Corpus (BNC) , 1992 .

[18]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  G. Leech 100 million words of English , 1993, English Today.

[20]  Branimir Boguraev,et al.  Natural Language Engineering , 1995 .

[21]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[22]  H. Silfverhielm,et al.  Sweden , 1996, The Lancet.

[23]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[24]  Marilyn A. Walker,et al.  PARADISE: A Framework for Evaluating Spoken Dialogue Agents , 1997, ACL.

[25]  S. Crawford,et al.  Volume 1 , 2012, Journal of Diabetes Investigation.

[26]  Roberto Pieraccini,et al.  Using Markov decision process for learning dialogue strategies , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[27]  David Traum,et al.  Speech Acts for Dialogue Agents , 1999 .

[28]  G. Carpenter,et al.  Behavioral and Brain Sciences , 1999 .

[29]  Ronald A. Cole,et al.  TOOLS FOR RESEARCH AND EDUCATION IN SPEECH SCIENCE , 1999 .

[30]  Marilyn A. Walker,et al.  Reinforcement Learning for Spoken Dialogue Systems , 1999, NIPS.

[31]  Marilyn A. Walker,et al.  Towards developing general models of usability with PARADISE , 2000, Natural Language Engineering.

[32]  Virginia Reviewer-Teller,et al.  Review of Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin. Prentice Hall 2000. , 2000 .

[33]  Lori Lamel,et al.  The LIMSI ARISE system , 2000, Speech Commun..

[34]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[35]  No Value,et al.  Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , 2000 .

[36]  Marilyn A. Walker,et al.  Natural Language Generation in Dialog Systems , 2001, HLT.

[37]  Alexander I. Rudnicky,et al.  N-best speech hypotheses reordering using linear regression , 2001, INTERSPEECH.

[38]  A. B.,et al.  SPEECH COMMUNICATION , 2001 .

[39]  Nancy Green,et al.  A Constraint-Based Approach for Cooperative Information-Seeking Dialogue , 2002, INLG.

[40]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[41]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[42]  Editors , 2003 .

[43]  Marilyn A. Walker,et al.  Trainable Sentence Planning for Complex Information Presentations in Spoken Dialog Systems , 2004, ACL.

[44]  Sebastian Möller,et al.  INSPIRE: Evaluation of a Smart-Home System for Infotainment Management and Device Control , 2004, LREC.

[45]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[46]  D. Scott Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics , 2004 .

[47]  Michael F. McTear,et al.  Handling errors and determining confirmation strategies - An object-based approach , 2003, Speech Commun..

[48]  Ye-Yi Wang,et al.  Spoken language understanding , 2005, IEEE Signal Processing Magazine.

[49]  C. Sidner,et al.  Knowledge and Reasoning in Practical Dialogue Systems , 2005 .

[50]  J. Schatztnann,et al.  Effects of the user model on simulation-based learning of dialogue strategies , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[51]  Shie Mannor,et al.  Reinforcement learning with Gaussian processes , 2005, ICML.

[52]  Luc De Raedt,et al.  Proceedings of the 22nd international conference on Machine learning , 2005 .

[53]  Tim Paek,et al.  Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths and Weaknesses for Practical Deployment , 2006 .

[54]  Sebastian Möller,et al.  Memo: towards automatic usability evaluation of spoken dialogue services by user error simulations , 2006, INTERSPEECH.

[55]  Steve J. Young,et al.  A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies , 2006, The Knowledge Engineering Review.

[56]  Jimmy J. Lin,et al.  Overview of the TREC 2006 ciQA task , 2007, SIGF.

[57]  Rieks op den Akker,et al.  Handling speech input in the ritel QA dialogue system , 2007, INTERSPEECH.

[58]  Hui Ye,et al.  The Hidden Information State Approach to Dialog Management , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[59]  Hui Ye,et al.  Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.

[60]  David Suendermann-Oeft,et al.  Caller Experience: A method for evaluating dialog systems and its automatic prediction , 2008, 2008 IEEE Spoken Language Technology Workshop.

[61]  Ellen M. Voorhees,et al.  Evaluating Question Answering System Performance , 2008 .

[62]  Gary Geunbae Lee,et al.  Example-based dialog modeling for practical multi-domain dialog system , 2009, Speech Commun..

[63]  Jean Scholtz,et al.  Questionnaires for eliciting evaluation data from users of interactive question answering systems , 2009, Natural Language Engineering.

[64]  Ron Artstein,et al.  An Integrated Authoring Tool for Tactical Questioning Dialogue Systems , 2009 .

[65]  Maxine Eskénazi,et al.  The Spoken Dialogue Challenge , 2009, SIGDIAL Conference.

[66]  Alon Lavie,et al.  The Meteor metric for automatic evaluation of machine translation , 2009, Machine Translation.

[67]  Oliver Lemon,et al.  Does this list contain what you were searching for? Learning adaptive dialogue strategies for interactive question answering , 2009, Natural Language Engineering.

[68]  Christine L. Lisetti,et al.  Proceedings of the 13th International Conference on Human Computer Interaction , 2009 .

[69]  Albert A. Rizzo,et al.  Human Computer Interaction in Virtual Standardized Patient Systems , 2009, HCI.

[70]  Sebastian Möller,et al.  Modeling User Satisfaction with Hidden Markov Models , 2009, SIGDIAL Conference.

[71]  Sebastian Möller,et al.  Analysis of a new simulation approach to dialog system evaluation , 2009, Speech Commun..

[72]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[73]  Kazuya Takeda,et al.  Estimation Method of User Satisfaction Using N-gram-based Dialog History Model for Spoken Dialog System , 2010, LREC.

[74]  Alan Ritter,et al.  Unsupervised Modeling of Twitter Conversations , 2010, NAACL.

[75]  Udo Kruschwitz,et al.  Proceedings of the 32nd European conference on Advances in Information Retrieval , 2010 .

[76]  Milica Gasic,et al.  Phrase-Based Statistical Language Generation Using Graphical Models and Active Learning , 2010, ACL.

[77]  F. Rudzicz Human Language Technologies : The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics , 2010 .

[78]  Ryuichiro Higashinaka,et al.  Issues in Predicting User Satisfaction Transitions in Dialogues: Individual Differences, Evaluation Criteria, and Prediction Models , 2010, IWSDS.

[79]  Milica Gasic,et al.  The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management , 2010, Comput. Speech Lang..

[80]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[81]  Anton Leuski,et al.  Toward Learning and Evaluation of Dialogue Policies with Text Examples , 2011, SIGDIAL Conference.

[82]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[83]  Alan Ritter,et al.  Data-Driven Response Generation in Social Media , 2011, EMNLP.

[84]  Milica Gasic,et al.  Real User Evaluation of Spoken Dialogue Systems Using Amazon Mechanical Turk , 2011, INTERSPEECH.

[85]  Maxine Eskénazi,et al.  Spoken Dialog Challenge 2010: Comparison of Live and Control Test Results , 2011, SIGDIAL Conference.

[86]  Marie-Francine Moens,et al.  A survey on question answering technology from an information retrieval perspective , 2011, Inf. Sci..

[87]  Johanna D. Moore,et al.  Proceedings of the SIGDIAL 2011 Conference , 2011, SIGDIAL 2011.

[88]  Cristian Danescu-Niculescu-Mizil,et al.  Chameleons in Imagined Conversations: A New Approach to Understanding Coordination of Linguistic Style in Dialogs , 2011, CMCL@ACL.

[89]  Hermann Ney,et al.  Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[90]  Milica Gasic,et al.  On-line policy optimisation of spoken dialogue systems via live interaction with human subjects , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[91]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[92]  Helen F. Hastie,et al.  A survey on metrics for the evaluation of user simulations , 2012, The Knowledge Engineering Review.

[93]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[94]  Bernardo Magnini,et al.  Question answering at the cross-language evaluation forum 2003–2010 , 2012, Language Resources and Evaluation.

[95]  Sivaji Bandyopadhyay,et al.  Emerging Applications of Natural Language Processing: Concepts and New Research , 2012 .

[96]  J. Ginzburg,et al.  Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue , 2012 .

[97]  Oliver Lemon,et al.  Data-Driven Methods for Adaptive Spoken Dialogue Systems , 2012, Springer New York.

[98]  Haizhou Li,et al.  IRIS: a Chat-oriented Dialogue System based on the Vector Space Model , 2012, ACL.

[99]  Walter Daelemans,et al.  Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics , 2012 .

[100]  David M. W. Powers,et al.  The Problem with Kappa , 2012, EACL.

[101]  Xiaofei Lu The Relationship of Lexical Richness to the Quality of ESL Learners' Oral Narratives. , 2012 .

[102]  Wolfgang Minker,et al.  A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let’s Go Bus Information System , 2012, LREC.

[103]  Rafael E. Banchs Movie-DiC: a Movie Dialogue Corpus for Research and Development , 2012, ACL.

[104]  Isabel Balteiro When Spanish owns English words1 , 2012, English Today.

[105]  Dongho Kim,et al.  POMDP-based dialogue manager adaptation to extended domains , 2013, SIGDIAL Conference.

[106]  Carina Silberer,et al.  Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2013 .

[107]  David R. Traum,et al.  Surface Text based Dialogue Models for Virtual Humans , 2013, SIGDIAL Conference.

[108]  Angeliki Metallinou,et al.  Discriminative state tracking for spoken dialog systems , 2013, ACL.

[109]  Antoine Raux,et al.  The Dialog State Tracking Challenge , 2013, SIGDIAL Conference.

[110]  Matthew Richardson,et al.  MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text , 2013, EMNLP.

[111]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[112]  Matthew Henderson,et al.  Deep Neural Network Approach for the Dialog State Tracking Challenge , 2013, SIGDIAL Conference.

[113]  Constantin Orasan,et al.  Interactive Question Answering , 2013 .

[114]  Oren Etzioni,et al.  Paraphrase-Driven Learning for Open Question Answering , 2013, ACL.

[115]  Barbara Di Eugenio,et al.  Proceedings of the SIGDIAL 2013 Conference , 2013, SIGDIAL 2013.

[116]  Helen F. Hastie,et al.  Conditional Random Fields for Responsive Surface Realisation using Global Features , 2013, ACL.

[117]  Wolfgang Minker,et al.  On Quality Ratings for Spoken Dialogue Systems – Experts vs. Users , 2013, NAACL.

[118]  Luísa Coheur,et al.  From subtitles to human interactions : introducing the SubTle Corpus , 2013 .

[119]  Matthew Henderson,et al.  The Second Dialog State Tracking Challenge , 2014, SIGDIAL Conference.

[120]  Geoffrey Zweig,et al.  Joint semantic utterance classification and slot filling with recursive neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[121]  Matthew Henderson,et al.  The third Dialog State Tracking Challenge , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[122]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[123]  Dongho Kim,et al.  Incremental on-line adaptation of POMDP-based dialogue managers to extended domains , 2014, INTERSPEECH.

[124]  Geoffrey Zweig,et al.  Spoken language understanding using long short-term memory neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[125]  D. Traum,et al.  A Semi-automated Evaluation Metric for Dialogue Model Coherence , 2014, IWSDS.

[126]  Antoine Raux,et al.  The Dialog State Tracking Challenge Series , 2014, AI Mag..

[127]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[128]  Walter Daelemans,et al.  Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , 2014, EMNLP 2014.

[129]  C. Rosé,et al.  Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue , 2015 .

[130]  Geoffrey Zweig,et al.  Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[131]  David Vandyke,et al.  Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[132]  Jianfeng Gao,et al.  A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.

[133]  Stefan Ultes,et al.  Interaction Quality: Assessing the quality of ongoing spoken dialog interaction by experts - And how it relates to user satisfaction , 2015, Speech Commun..

[134]  Jianfeng Gao,et al.  deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets , 2015, ACL.

[135]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[136]  Joelle Pineau,et al.  The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[137]  J. N. Schrading Analyzing Domestic Abuse using Natural Language Processing on Social Media Data , 2015 .

[138]  Hang Li,et al.  Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[139]  Yannis Stylianou,et al.  Learning Domain-Independent Dialogue Policies via Ontology Parameterisation , 2015, SIGDIAL Conference.

[140]  Chengqing Zong,et al.  Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) , 2015, IJCNLP 2015.

[141]  Yi Yang,et al.  WikiQA: A Challenge Dataset for Open-Domain Question Answering , 2015, EMNLP.

[142]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[143]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[144]  Joelle Pineau,et al.  How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.

[145]  Rafael E. Banchs,et al.  The Fourth Dialog State Tracking Challenge , 2016, IWSDS.

[146]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[147]  Houfeng Wang,et al.  A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding , 2016, IJCAI.

[148]  Bing Liu,et al.  Lifelong machine learning: a paradigm for continuous learning , 2017, Frontiers of Computer Science.

[149]  Guillaume Dubuisson Duplessis,et al.  Comparing System-response Retrieval Models for Open-domain and Casual Conversational Agent , 2016 .

[150]  Beatrice Alex,et al.  Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC-2016) , 2016 .

[151]  Amit Mishra,et al.  A survey on question answering systems with classification , 2016, J. King Saud Univ. Comput. Inf. Sci..

[152]  Ani Nenkova,et al.  Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , 2016, NAACL 2016.

[153]  Oren Etzioni,et al.  My Computer Is an Honor Student - but How Intelligent Is It? Standardized Tests as a Measure of AI , 2016, AI Mag..

[154]  Philipp Koehn,et al.  Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016 .

[155]  Antoine Raux,et al.  The Dialog State Tracking Challenge Series: A Review , 2016, Dialogue Discourse.

[156]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[157]  Ondrej Dusek,et al.  Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings , 2016, ACL.

[158]  Joelle Pineau,et al.  On the Evaluation of Dialogue Systems with Next Utterance Classification , 2016, SIGDIAL Conference.

[159]  David Vandyke,et al.  Multi-domain Neural Network Language Generation for Spoken Dialogue Systems , 2016, NAACL.

[160]  Timothy M. Hospedales,et al.  Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing , 2016 .

[161]  Guillaume Dubuisson Duplessis,et al.  Purely Corpus-based Automatic Conversation Authoring , 2016, LREC.

[162]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[163]  Maxine Eskénazi,et al.  Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning , 2016, SIGDIAL Conference.

[164]  Matthew R. Walter,et al.  What to talk about and how? Selective Generation using LSTMs with Coarse-to-Fine Alignment , 2015, NAACL.

[165]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[166]  Regina Barzilay,et al.  Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , 2017, ACL 2017.

[167]  Eric P. Xing,et al.  Toward Controlled Generation of Text , 2017, ICML.

[168]  Xiaoyu Shen,et al.  DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset , 2017, IJCNLP.

[169]  Christian Raymond,et al.  Label-Dependency Coding in Simple Recurrent Networks for Spoken Language Understanding , 2017, INTERSPEECH.

[170]  Joelle Pineau,et al.  Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses , 2017, ACL.

[171]  Guillaume Dubuisson Duplessis,et al.  Utterance Retrieval Based on Recurrent Surface Text Patterns , 2017, ECIR.

[172]  Philip Bachman,et al.  NewsQA: A Machine Comprehension Dataset , 2016, Rep4NLP@ACL.

[173]  Said Ouatik El Alaoui,et al.  A passage retrieval method based on probabilistic information retrieval model and UMLS concepts in biomedical question answering , 2017, J. Biomed. Informatics.

[174]  Verena Rieser,et al.  The E2E Dataset: New Challenges For End-to-End Generation , 2017, SIGDIAL Conference.

[175]  Maxine Eskénazi,et al.  Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders , 2017, ACL.

[176]  Ryuichiro Higashinaka,et al.  Automatic Evaluation of Chat-Oriented Dialogue Systems Using Large-Scale Multi-references , 2017, IWSDS.

[177]  Jason Weston,et al.  ParlAI: A Dialog Research Software Platform , 2017, EMNLP.

[178]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[179]  Joelle Pineau,et al.  A Deep Reinforcement Learning Chatbot , 2017, ArXiv.

[180]  Elia Bruni,et al.  Adversarial evaluation for open-domain dialogue generation , 2017, SIGDIAL Conference.

[181]  Guokun Lai,et al.  RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.

[182]  Jianfeng Gao,et al.  End-to-End Task-Completion Neural Dialogue Systems , 2017, IJCNLP.

[183]  Christopher D. Manning,et al.  Key-Value Retrieval Networks for Task-Oriented Dialogue , 2017, SIGDIAL Conference.

[184]  Minh-Tien Nguyen,et al.  Legal Question Answering using Ranking SVM and Deep Convolutional Neural Network , 2017, ArXiv.

[185]  Asim Kadav,et al.  A Context-aware Attention Network for Interactive Question Answering , 2016, KDD.

[186]  Erhardt Barth,et al.  A Hybrid Convolutional Variational Autoencoder for Text Generation , 2017, EMNLP.

[187]  Ming-Wei Chang,et al.  Search-based Neural Structured Learning for Sequential Question Answering , 2017, ACL.

[188]  Oriol Vinyals,et al.  Adversarial Evaluation of Dialogue Models , 2017, ArXiv.

[189]  Jiliang Tang,et al.  A Survey on Dialogue Systems: Recent Advances and New Frontiers , 2017, SKDD.

[190]  Bowen Zhou,et al.  Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation , 2016, AAAI.

[191]  David Vandyke,et al.  PyDial: A Multi-domain Statistical Dialogue System Toolkit , 2017, ACL.

[192]  Joelle Pineau,et al.  Training End-to-End Dialogue Systems with the Ubuntu Dialogue Corpus , 2017, Dialogue Discourse.

[193]  Tsung-Hsien Wen,et al.  Neural Belief Tracker: Data-Driven Dialogue State Tracking , 2016, ACL.

[194]  Wei-Ying Ma,et al.  Topic Aware Neural Response Generation , 2016, AAAI.

[195]  Stan Matwin,et al.  Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 2017, KDD.

[196]  Taro Watanabe,et al.  Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers) , 2017, IJCNLP.

[197]  Eunsol Choi,et al.  TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[198]  Vanessa López,et al.  Core techniques of question answering systems over knowledge bases: a survey , 2017, Knowledge and Information Systems.

[199]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[200]  Dongyan Zhao,et al.  RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems , 2017, AAAI.

[201]  The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval , 2018, SIGIR.

[202]  Zachary Chase Lipton,et al.  Born Again Neural Networks , 2018, ICML.

[203]  Ming-Wei Chang,et al.  A Knowledge-Grounded Neural Conversation Model , 2017, AAAI.

[204]  Angeliki Metallinou,et al.  Topic-based Evaluation for Conversational Bots , 2018, ArXiv.

[205]  S. Hewitt,et al.  2007 , 2018, Los 25 años de la OMC: Una retrospectiva fotográfica.

[206]  Eunsol Choi,et al.  QuAC: Question Answering in Context , 2018, EMNLP.

[207]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[208]  Joelle Pineau,et al.  A Survey of Available Corpora for Building Data-Driven Dialogue Systems , 2015, Dialogue Discourse.

[209]  Iñigo Casanueva,et al.  Neural User Simulation for Corpus-based Policy Optimisation of Spoken Dialogue Systems , 2018, SIGDIAL Conference.

[210]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[211]  Jonathan Berant,et al.  The Web as a Knowledge-Base for Answering Complex Questions , 2018, NAACL.

[212]  Dilek Z. Hakkani-Tür,et al.  Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems , 2018, NAACL.

[213]  W. Bruce Croft,et al.  Analyzing and Characterizing User Intent in Information-seeking Conversations , 2018, SIGIR.

[214]  Chris Dyer,et al.  The NarrativeQA Reading Comprehension Challenge , 2017, TACL.

[215]  Salvatore Vanini,et al.  Behavioural simulator for professional training based on natural language interaction , 2018 .

[216]  Mitesh M. Khapra,et al.  Complex Sequential Question Answering: Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge Graph , 2018, AAAI.

[217]  Noriko Kando,et al.  Do systems pass university entrance exams? , 2018, Inf. Process. Manag..

[218]  Stefan Ultes,et al.  MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling , 2018, EMNLP.

[219]  Bill Byrne,et al.  Taskmaster-1: Toward a Realistic and Diverse Dialog Dataset , 2019, EMNLP.

[220]  Yun-Nung Chen,et al.  FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension , 2019, EMNLP.

[221]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[222]  Lingjia Tang,et al.  An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction , 2019, EMNLP.

[223]  W. Bruce Croft,et al.  Attentive History Selection for Conversational Question Answering , 2019, CIKM.

[224]  Eunsol Choi,et al.  CONVERSATIONAL MACHINE COMPREHENSION , 2019 .

[225]  Shijie Chen,et al.  Technical report on Conversational Question Answering , 2019, ArXiv.

[226]  Walter S. Lasecki,et al.  DSTC7 Task 1: Noetic End-to-End Response Selection , 2019, Proceedings of the First Workshop on NLP for Conversational AI.

[227]  Danqi Chen,et al.  CoQA: A Conversational Question Answering Challenge , 2018, TACL.

[228]  Jason Weston,et al.  Learning from Dialogue after Deployment: Feed Yourself, Chatbot! , 2019, ACL.

[229]  Nikolai Rozanov,et al.  LIDA: Lightweight Interactive Dialogue Annotator , 2019, EMNLP/IJCNLP.

[230]  Mitesh M. Khapra,et al.  Re-evaluating ADEM: A Deeper Look at Scoring Dialogue Responses , 2019, AAAI.

[231]  Mona T. Diab,et al.  Multi-Domain Goal-Oriented Dialogues (MultiDoGO): Strategies toward Curating and Annotating Large Scale Dialogue Data , 2019, EMNLP.

[232]  신애자,et al.  1998 , 2001, The Winning Cars of the Indianapolis 500.

[233]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[234]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[235]  Maxine Eskénazi,et al.  Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References , 2019, SIGdial.

[236]  Jianfeng Gao,et al.  Multi-Domain Task-Completion Dialog Challenge , 2019 .

[237]  Harry Shum,et al.  The Design and Implementation of XiaoIce, an Empathetic Social Chatbot , 2018, CL.

[238]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[239]  Anna Rumshisky,et al.  Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks , 2020, AAAI.

[240]  Anna Rumshisky,et al.  A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.

[241]  Quoc V. Le,et al.  Towards a Human-like Open-Domain Chatbot , 2020, ArXiv.

[242]  Raghav Gupta,et al.  Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset , 2019, AAAI Conference on Artificial Intelligence.

[243]  Verena Rieser,et al.  Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge , 2019, Comput. Speech Lang..

[244]  Manning , 2022, Dream-Child.