论文信息 - Gaussian Process based Deep Dyna-Q approach for Dialogue Policy Learning - 字舞流文

Gaussian Process based Deep Dyna-Q approach for Dialogue Policy Learning

August 1–6, 2021. ©2021 Association for Computational Linguistics 1786 Gaussian Process based Deep Dyna-Q Approach for Dialogue Policy Learning Guanlin Wu 1,2,∗ Wenqi Fang 3,∗,† Ji Wang 1,∗ Jiang Cao 2 Weidong Bao 1 Yang Ping 2 Xiaomin Zhu 1 Zheng Wang 4 National University of Defense Technology, Changsha, China Academy of Military Science, Beijing, China Nanhu Laboratory, Jiaxing, China Shenzhen Institutes of Advanced Technology, CAS, Shenzhen, China {wuguanlin16,wangji,wdbao,xmzhu}@nudt.edu.cn ocean.py@163.com amscaojiang@126.com wqfang@nanhulab.ac.cn zheng.wang@siat.ac.cn Abstract

Xiaomin Zhu | Weidong Bao | Wenqi Fang | Zheng Wang | Ji Wang | Guanlin Wu | Jiang Cao | Yang Ping

[1] Jason Weston,et al. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[2] Rui Zhang,et al. Dynamic Reward-Based Dueling Deep Dyna-Q: Robust Policy Learning in Noisy Environments , 2020, AAAI.

[3] Xiang Zhou,et al. Deep Reinforcement Learning for On-line Dialogue State Tracking , 2020, ArXiv.

[4] David Vandyke,et al. On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems , 2016, ACL.

[5] David Duvenaud,et al. Automatic model construction with Gaussian processes , 2014 .

[6] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[7] Sriparna Saha,et al. Towards sentiment aided dialogue policy learning for multi-intent conversations using hierarchical reinforcement learning , 2020, PloS one.

[8] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.

[9] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[10] Malte Kuß,et al. Gaussian process models for robust regression, classification, and reinforcement learning , 2006 .

[11] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12] Jianfeng Gao,et al. Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning , 2018, EMNLP.

[13] David Vandyke,et al. Dialogue manager domain adaptation using Gaussian process reinforcement learning , 2016, Comput. Speech Lang..

[14] Roberto Pieraccini,et al. Learning dialogue strategies within the Markov decision process framework , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[15] Arno Solin,et al. Variational Fourier Features for Gaussian Processes , 2016, J. Mach. Learn. Res..

[16] Shiqi Zhang,et al. Goal-Oriented Dialogue Policy Learning from Failures , 2018, AAAI.

[17] Kam-Fai Wong,et al. Integrating planning for task-completion dialogue policy learning , 2018, ACL.

[18] M. de Rijke,et al. Rethinking Supervised Learning and Reinforcement Learning in Task-Oriented Dialogue Systems , 2020, FINDINGS.

[19] Jorge Nocedal,et al. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[20] Sebastian Ruder,et al. An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[21] Joseph Weizenbaum,et al. ELIZA—a computer program for the study of natural language communication between man and machine , 1966, CACM.

[22] Zachary Chase Lipton,et al. Efficient Exploration for Dialogue Policy Learning with BBQ Networks & Replay Buffer Spiking , 2016 .

[23] Elliot J. Crowley,et al. Deep Kernel Transfer in Gaussian Processes for Few-shot Learning , 2019, ArXiv.

[24] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[25] Joelle Pineau,et al. A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[26] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[27] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.

[28] Kam-Fai Wong,et al. Learning Efficient Dialogue Policy from Demonstrations through Shaping , 2020, ACL.

[29] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.