Dialogue Learning With Human-In-The-Loop

An important aspect of developing conversational agents is to give a bot the ability to improve through communicating with humans and to learn from the mistakes that it makes. Most research has focused on learning from fixed training sets of labeled data rather than interacting with a dialogue partner in an online fashion. In this paper we explore this direction in a reinforcement learning setting where the bot improves its question-answering ability from feedback a teacher gives following its generated responses. We build a simulator that tests various aspects of such learning in a synthetic environment, and introduce models that work in this regime. Finally, real experiments with Mechanical Turk validate the approach.

[1]  M. Werts,et al.  Instructive feedback: Review of parameters and effects , 1995 .

[2]  Roberto Pieraccini,et al.  Learning dialogue strategies within the Markov decision process framework , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[3]  Marilyn A. Walker,et al.  An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email , 2000, J. Artif. Intell. Res..

[4]  Marilyn A. Walker,et al.  Empirical Evaluation of a Reinforcement Learning Spoken Dialogue System , 2000, AAAI/IAAI.

[5]  Roberto Pieraccini,et al.  A stochastic model of human-machine interaction for learning dialog strategies , 2000, IEEE Trans. Speech Audio Process..

[6]  S. Singh,et al.  Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[7]  Marilyn A. Walker,et al.  A trainable generator for recommendations in multimodal dialog , 2003, INTERSPEECH.

[8]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[9]  Steve J. Young,et al.  A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies , 2006, The Knowledge Engineering Review.

[10]  David Suendermann-Oeft,et al.  Are We There Yet? Research in Commercial Spoken Dialog Systems , 2009, TSD.

[11]  Milica Gasic,et al.  The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management , 2010, Comput. Speech Lang..

[12]  Mohammad Amin Bassiri Interactional Feedback and the Impact of Attitude and Motivation on Noticing L2 Form , 2011 .

[13]  Dongho Kim,et al.  POMDP-based dialogue manager adaptation to extended domains , 2013, SIGDIAL Conference.

[14]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[15]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[16]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[17]  Dongho Kim,et al.  Incremental on-line adaptation of POMDP-based dialogue managers to extended domains , 2014, INTERSPEECH.

[18]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[19]  Wojciech Zaremba,et al.  Reinforcement Learning Neural Turing Machines - Revised , 2015 .

[20]  Jason Weston,et al.  Large-scale Simple Question Answering with Memory Networks , 2015, ArXiv.

[21]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[22]  Wojciech Zaremba,et al.  Reinforcement Learning Neural Turing Machines , 2015, ArXiv.

[23]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[24]  David Vandyke,et al.  Continuously Learning Neural Dialogue Management , 2016, ArXiv.

[25]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[26]  Jason Weston,et al.  Dialog-based Language Learning , 2016, NIPS.

[27]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[28]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[29]  Jason Weston,et al.  Key-Value Memory Networks for Directly Reading Documents , 2016, EMNLP.

[30]  Xiang Zhang,et al.  Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems , 2015, ICLR.