Interactive reinforcement learning for task-oriented dialogue management

Dialogue management is the component of a dialogue system that determines the optimal action for the system to take at each turn. An important consideration for dialogue managers is the ability to adapt to new user behaviors unseen during training. In this paper, we investigate policy gradient based methods for interactive reinforcement learning where the agent receives action-specific feedback from the user and incorporates this feedback into its policy. We show that using the feedback to directly shape the policy enables a dialogue manager to learn new interactions faster compared to interpreting the feedback as a reward value.

[1]  Olivier Pietquin,et al.  Inverse reinforcement learning for interactive systems , 2013, MLIS '13.

[2]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[3]  Hui Ye,et al.  Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.

[4]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[5]  M. Jørgensen,et al.  Discourse Analysis as Theory and Method , 2002 .

[6]  Matthew Lai,et al.  Giraffe: Using Deep Reinforcement Learning to Play Chess , 2015, ArXiv.

[7]  David Vandyke,et al.  On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems , 2016, ACL.

[8]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[9]  Peter Stone,et al.  Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.

[10]  Dongho Kim,et al.  On-line policy optimisation of Bayesian spoken dialogue systems via human interaction , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[12]  Bradley C. Love,et al.  How Humans Teach Agents - A New Experimental Perspective , 2012, Int. J. Soc. Robotics.

[13]  David L. Roberts,et al.  Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning , 2015, Autonomous Agents and Multi-Agent Systems.

[14]  Douglas Biber,et al.  On the complexity of discourse complexity: A multidimensional analysis , 1992 .

[15]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[16]  Eric P. Xing,et al.  Harnessing Deep Neural Networks with Logic Rules , 2016, ACL.

[17]  Brian Paltridge Discourse Analysis: An Introduction , 2007 .

[18]  Matthew Henderson,et al.  Policy optimisation of POMDP-based dialogue systems without state space compression , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[19]  Sonia Chernova,et al.  Effect of human guidance and state space size on Interactive Reinforcement Learning , 2011, 2011 RO-MAN.

[20]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Understanding How People Want to Teach Robots , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[21]  Jianfeng Gao,et al.  Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access , 2016, ACL.

[22]  Andrea Lockerd Thomaz,et al.  Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[23]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[24]  Milica Gasic,et al.  Gaussian Processes for Fast Policy Optimisation of POMDP-based Dialogue Managers , 2010, SIGDIAL Conference.

[25]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26]  Jing He,et al.  Policy Networks with Two-Stage Training for Dialogue Systems , 2016, SIGDIAL Conference.

[27]  Heriberto Cuayáhuitl,et al.  SimpleDS: A Simple Deep Reinforcement Learning Dialogue System , 2016, IWSDS.

[28]  Dan Roth,et al.  Learning from natural instructions , 2011, Machine Learning.

[29]  David Vandyke,et al.  Continuously Learning Neural Dialogue Management , 2016, ArXiv.

[30]  Roberto Pieraccini,et al.  A stochastic model of computer-human interaction for learning dialogue strategies , 1997, EUROSPEECH.

[31]  Stéphane Doncieux,et al.  Crossing the reality gap in evolutionary robotics by promoting transferable controllers , 2010, GECCO '10.

[32]  Steve J. Young,et al.  The Hidden Agenda User Simulation Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  Andrea Lockerd Thomaz,et al.  Policy Shaping with Human Teachers , 2015, IJCAI.

[34]  David L. Roberts,et al.  A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback , 2014, AAAI.

[35]  Oliver Lemon,et al.  Strategic Dialogue Management via Deep Reinforcement Learning , 2015, NIPS 2015.

[36]  Jason D. Williams,et al.  Partially Observable Markov Decision Processes for Spoken Dialogue Management , 2006 .

[37]  Steve J. Young,et al.  USING POMDPS FOR DIALOG MANAGEMENT , 2006, 2006 IEEE Spoken Language Technology Workshop.

[38]  Cynthia Breazeal,et al.  Real-Time Interactive Reinforcement Learning for Robots , 2005 .