Gaussian Process based Deep Dyna-Q approach for Dialogue Policy Learning
暂无分享,去创建一个
Xiaomin Zhu | Weidong Bao | Wenqi Fang | Zheng Wang | Ji Wang | Guanlin Wu | Jiang Cao | Yang Ping
[1] Jason Weston,et al. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.
[2] Rui Zhang,et al. Dynamic Reward-Based Dueling Deep Dyna-Q: Robust Policy Learning in Noisy Environments , 2020, AAAI.
[3] Xiang Zhou,et al. Deep Reinforcement Learning for On-line Dialogue State Tracking , 2020, ArXiv.
[4] David Vandyke,et al. On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems , 2016, ACL.
[5] David Duvenaud,et al. Automatic model construction with Gaussian processes , 2014 .
[6] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[7] Sriparna Saha,et al. Towards sentiment aided dialogue policy learning for multi-intent conversations using hierarchical reinforcement learning , 2020, PloS one.
[8] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[9] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[10] Malte Kuß,et al. Gaussian process models for robust regression, classification, and reinforcement learning , 2006 .
[11] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[12] Jianfeng Gao,et al. Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning , 2018, EMNLP.
[13] David Vandyke,et al. Dialogue manager domain adaptation using Gaussian process reinforcement learning , 2016, Comput. Speech Lang..
[14] Roberto Pieraccini,et al. Learning dialogue strategies within the Markov decision process framework , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.
[15] Arno Solin,et al. Variational Fourier Features for Gaussian Processes , 2016, J. Mach. Learn. Res..
[16] Shiqi Zhang,et al. Goal-Oriented Dialogue Policy Learning from Failures , 2018, AAAI.
[17] Kam-Fai Wong,et al. Integrating planning for task-completion dialogue policy learning , 2018, ACL.
[18] M. de Rijke,et al. Rethinking Supervised Learning and Reinforcement Learning in Task-Oriented Dialogue Systems , 2020, FINDINGS.
[19] Jorge Nocedal,et al. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.
[20] Sebastian Ruder,et al. An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.
[21] Joseph Weizenbaum,et al. ELIZA—a computer program for the study of natural language communication between man and machine , 1966, CACM.
[22] Zachary Chase Lipton,et al. Efficient Exploration for Dialogue Policy Learning with BBQ Networks & Replay Buffer Spiking , 2016 .
[23] Elliot J. Crowley,et al. Deep Kernel Transfer in Gaussian Processes for Few-shot Learning , 2019, ArXiv.
[24] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[25] Joelle Pineau,et al. A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.
[26] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[27] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.
[28] Kam-Fai Wong,et al. Learning Efficient Dialogue Policy from Demonstrations through Shaping , 2020, ACL.
[29] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.