Evaluating Conversational Recommender Systems via User Simulation

Conversational information access is an emerging research area. Currently, human evaluation is used for end-to-end system evaluation, which is both very time and resource intensive at scale, and thus becomes a bottleneck of progress. As an alternative, we propose automated evaluation by means of simulating users. Our user simulator aims to generate responses that a real human would give by considering both individual preferences and the general flow of interaction with the system. We evaluate our simulation approach on an item recommendation task by comparing three existing conversational recommender systems. We show that preference modeling and task-specific interaction models both contribute to more realistic simulations, and can help achieve high correlation between automatic evaluation measures and manual human assessments.

[1]  Xu Chen,et al.  Towards Conversational Search and Recommendation: System Ask, User Respond , 2018, CIKM.

[2]  Grace Chung,et al.  Developing a Flexible Spoken Dialog System Using Simulation , 2004, ACL.

[3]  David Maxwell,et al.  Modelling search and stopping in interactive information retrieval , 2019, SIGIR Forum.

[4]  M. de Rijke,et al.  QRFA: A Data-Driven Model of Information-Seeking Dialogues , 2018, ECIR.

[5]  Kam-Fai Wong,et al.  Integrating planning for task-completion dialogue policy learning , 2018, ACL.

[6]  W. Bruce Croft,et al.  Conversational Product Search Based on Negative Feedback , 2019, CIKM.

[7]  Johanne R. Trippas Spoken Conversational Search: Speech-only Interactive Information Retrieval , 2016, CHIIR.

[8]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[9]  Svitlana Vakulenko,et al.  Knowledge-based Conversational Search , 2019, ArXiv.

[10]  Jianfeng Gao,et al.  End-to-End Task-Completion Neural Dialogue Systems , 2017, IJCNLP.

[11]  David Griol,et al.  AN AUTOMATIC DIALOG SIMULATION TECHNIQUE TO DEVELOP AND EVALUATE INTERACTIVE CONVERSATIONAL AGENTS , 2013, Appl. Artif. Intell..

[12]  Anja Belz,et al.  Comparing Automatic and Human Evaluation of NLG Systems , 2006, EACL.

[13]  Filip Radlinski,et al.  Online Evaluation for Information Retrieval , 2016, Found. Trends Inf. Retr..

[14]  Filip Radlinski,et al.  A Theoretical Framework for Conversational Search , 2017, CHIIR.

[15]  Jamie Callan,et al.  TREC CAsT 2019: The Conversational Assistance Track Overview , 2020, ArXiv.

[16]  Krisztian Balog,et al.  Personal Knowledge Graphs: A Research Agenda , 2019, ICTIR.

[17]  Jiliang Tang,et al.  A Survey on Dialogue Systems: Recent Advances and New Frontiers , 2017, SKDD.

[18]  Hui Ye,et al.  Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.

[19]  Virginia Teller Review of Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin. Prentice Hall 2000. , 2000 .

[20]  Gökhan Tür,et al.  Collaborative Multi-Agent Dialogue Model Training Via Reinforcement Learning , 2019, SIGdial.

[21]  Kallirroi Georgila,et al.  Quantitative Evaluation of User Simulation Techniques for Spoken Dialogue Systems , 2005, SIGDIAL.

[22]  Helen F. Hastie,et al.  A survey on metrics for the evaluation of user simulations , 2012, The Knowledge Engineering Review.

[23]  Iñigo Casanueva,et al.  Neural User Simulation for Corpus-based Policy Optimisation of Spoken Dialogue Systems , 2018, SIGDIAL Conference.

[24]  Joelle Pineau,et al.  A Survey of Available Corpora for Building Data-Driven Dialogue Systems , 2015, Dialogue Discourse.

[25]  Jing He,et al.  A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems , 2016, INTERSPEECH.

[26]  Diane Kelly,et al.  Methods for Evaluating Interactive Information Retrieval Systems with Users , 2009, Found. Trends Inf. Retr..

[27]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[28]  Mark Sanderson,et al.  Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..

[29]  W. Bruce Croft,et al.  Asking Clarifying Questions in Open-Domain Information-Seeking Conversations , 2019, SIGIR.

[30]  Emine Yilmaz,et al.  Research Frontiers in Information Retrieval Report from the Third Strategic Workshop on Information Retrieval in Lorne (SWIRL 2018) , 2018 .

[31]  Steve J. Young,et al.  A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies , 2006, The Knowledge Engineering Review.

[32]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing , 2000 .

[33]  Martin Halvey,et al.  Conceptualizing agent-human interactions during the conversational search process , 2018 .

[34]  Joelle Pineau,et al.  How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.

[35]  Jun Huang,et al.  Response Ranking with Deep Matching Networks and External Knowledge in Information-seeking Conversation Systems , 2018, SIGIR.

[36]  Johanne R. Trippas,et al.  Spoken Conversational Search: Audio-only Interactive Information Retrieval , 2019 .

[37]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[38]  W. Bruce Croft,et al.  User Intent Prediction in Information-seeking Conversations , 2019, CHIIR.

[39]  Filip Radlinski,et al.  Towards Conversational Recommender Systems , 2016, KDD.

[40]  Alan Ritter,et al.  Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.