Policy Adaptation for Deep Reinforcement Learning-Based Dialogue Management

Policy optimization is the core part of statistical dialogue management. Deep reinforcement learning has been successfully used for dialogue policy optimization for a static pre-defined domain. However, when the domain changes dynamically, e.g. a new previously unseen concept (or slot) which can be then used as a database search constraint is added, or the policy for one domain is transferred to another domain, the dialogue state space and action sets both will change. Therefore, the model structures for different domains have to be different. This makes dialogue policy adaptation/transfer challenging. Here a multi -agent dialogue policy (MADP) is proposed to tackle these problems. MADP consists of some slot-dependent agents (S-Agents) and a slot-independent agent (G-Agent). S-Agents have shared parameters in addition to private parameters for each one. During policy transfer, the shared parameters in S-Agents and the parameters in G-Agent can be directly transferred to the agents in extended/new domain. Simulation experiments showed that MADP can significantly speed up the policy learning and facilitate policy adaptation.

[1]  Xiang Zhou,et al.  Affordable On-line Dialogue Policy Learning , 2017, EMNLP.

[2]  Geoffrey Zweig,et al.  Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning , 2017, ACL.

[3]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[4]  David Vandyke,et al.  Dialogue manager domain adaptation using Gaussian process reinforcement learning , 2016, Comput. Speech Lang..

[5]  Lu Chen,et al.  Semantic parser enhancement for dialogue domain extension with little data , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[6]  Dongho Kim,et al.  POMDP-based dialogue manager adaptation to extended domains , 2013, SIGDIAL Conference.

[7]  Xiang Zhou,et al.  Agent-Aware Dropout DQN for Safe and Efficient On-line Dialogue Policy Learning , 2017, EMNLP.

[8]  Oliver Lemon,et al.  Strategic Dialogue Management via Deep Reinforcement Learning , 2015, NIPS 2015.

[9]  Stefan Ultes,et al.  Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management , 2017, SIGDIAL Conference.

[10]  Matthew Henderson,et al.  The third Dialog State Tracking Challenge , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[11]  Bing Liu,et al.  End-to-End Optimization of Task-Oriented Dialogue Model with Deep Reinforcement Learning , 2017, ArXiv.

[12]  Jing He,et al.  Policy Networks with Two-Stage Training for Dialogue Systems , 2016, SIGDIAL Conference.

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[15]  Matthew Henderson,et al.  The Second Dialog State Tracking Challenge , 2014, SIGDIAL Conference.

[16]  Zachary Chase Lipton,et al.  Efficient Exploration for Dialogue Policy Learning with BBQ Networks & Replay Buffer Spiking , 2016 .

[17]  Jianfeng Gao,et al.  End-to-End Task-Completion Neural Dialogue Systems , 2017, IJCNLP.

[18]  Maxine Eskénazi,et al.  Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning , 2016, SIGDIAL Conference.

[19]  David Vandyke,et al.  Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[20]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[21]  Hui Ye,et al.  Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.

[22]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.