Gaussian Process based Deep Dyna-Q approach for Dialogue Policy Learning

August 1–6, 2021. ©2021 Association for Computational Linguistics 1786 Gaussian Process based Deep Dyna-Q Approach for Dialogue Policy Learning Guanlin Wu 1,2,∗ Wenqi Fang 3,∗,† Ji Wang 1,∗ Jiang Cao 2 Weidong Bao 1 Yang Ping 2 Xiaomin Zhu 1 Zheng Wang 4 National University of Defense Technology, Changsha, China Academy of Military Science, Beijing, China Nanhu Laboratory, Jiaxing, China Shenzhen Institutes of Advanced Technology, CAS, Shenzhen, China {wuguanlin16,wangji,wdbao,xmzhu}@nudt.edu.cn ocean.py@163.com amscaojiang@126.com wqfang@nanhulab.ac.cn zheng.wang@siat.ac.cn Abstract

[1]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[2]  Rui Zhang,et al.  Dynamic Reward-Based Dueling Deep Dyna-Q: Robust Policy Learning in Noisy Environments , 2020, AAAI.

[3]  Xiang Zhou,et al.  Deep Reinforcement Learning for On-line Dialogue State Tracking , 2020, ArXiv.

[4]  David Vandyke,et al.  On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems , 2016, ACL.

[5]  David Duvenaud,et al.  Automatic model construction with Gaussian processes , 2014 .

[6]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[7]  Sriparna Saha,et al.  Towards sentiment aided dialogue policy learning for multi-intent conversations using hierarchical reinforcement learning , 2020, PloS one.

[8]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[9]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[10]  Malte Kuß,et al.  Gaussian process models for robust regression, classification, and reinforcement learning , 2006 .

[11]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12]  Jianfeng Gao,et al.  Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning , 2018, EMNLP.

[13]  David Vandyke,et al.  Dialogue manager domain adaptation using Gaussian process reinforcement learning , 2016, Comput. Speech Lang..

[14]  Roberto Pieraccini,et al.  Learning dialogue strategies within the Markov decision process framework , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[15]  Arno Solin,et al.  Variational Fourier Features for Gaussian Processes , 2016, J. Mach. Learn. Res..

[16]  Shiqi Zhang,et al.  Goal-Oriented Dialogue Policy Learning from Failures , 2018, AAAI.

[17]  Kam-Fai Wong,et al.  Integrating planning for task-completion dialogue policy learning , 2018, ACL.

[18]  M. de Rijke,et al.  Rethinking Supervised Learning and Reinforcement Learning in Task-Oriented Dialogue Systems , 2020, FINDINGS.

[19]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[20]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[21]  Joseph Weizenbaum,et al.  ELIZA—a computer program for the study of natural language communication between man and machine , 1966, CACM.

[22]  Zachary Chase Lipton,et al.  Efficient Exploration for Dialogue Policy Learning with BBQ Networks & Replay Buffer Spiking , 2016 .

[23]  Elliot J. Crowley,et al.  Deep Kernel Transfer in Gaussian Processes for Few-shot Learning , 2019, ArXiv.

[24]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[25]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[26]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[27]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[28]  Kam-Fai Wong,et al.  Learning Efficient Dialogue Policy from Demonstrations through Shaping , 2020, ACL.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.