Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management
暂无分享,去创建一个
Kai Yu | Zhi Chen | Lu Chen | Xiaoyuan Liu | Kai Yu | Lu Chen | Xiaoyuan Liu | Zhi Chen
[1] Jianfeng Gao,et al. End-to-End Task-Completion Neural Dialogue Systems , 2017, IJCNLP.
[2] Milica Gasic,et al. The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management , 2010, Comput. Speech Lang..
[3] Jing He,et al. Policy Networks with Two-Stage Training for Dialogue Systems , 2016, SIGDIAL Conference.
[4] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[5] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[6] Yannis Stylianou,et al. Single-Model Multi-domain Dialogue Management with Deep Learning , 2017, IWSDS.
[7] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[8] E. Ionides. Truncated Importance Sampling , 2008 .
[9] Steve J. Young,et al. Natural actor and belief critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs , 2011, TSLP.
[10] Pei-Hao Su,et al. Sample Efficient Deep Reinforcement Learning for Dialogue Systems With Large Action Spaces , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[11] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[12] David Vandyke,et al. PyDial: A Multi-domain Statistical Dialogue System Toolkit , 2017, ACL.
[13] Stefan Ultes,et al. Feudal Reinforcement Learning for Dialogue Management in Large Domains , 2018, NAACL.
[14] Chong Wang,et al. Subgoal Discovery for Hierarchical Dialogue Policy Learning , 2018, EMNLP.
[15] Jianfeng Gao,et al. Efficient Exploration for Dialog Policy Learning with Deep BBQ Networks \& Replay Buffer Spiking , 2016, ArXiv.
[16] Zhiyuan Liu,et al. Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.
[17] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[18] Jun S. Liu,et al. Metropolized independent sampling with comparisons to rejection sampling and importance sampling , 1996, Stat. Comput..
[19] Stefan Ultes,et al. Feudal Dialogue Management with Jointly Learned Feature Extractors , 2018, SIGDIAL Conference.
[20] Yannis Stylianou,et al. Learning Domain-Independent Dialogue Policies via Ontology Parameterisation , 2015, SIGDIAL Conference.
[21] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[22] Stefan Ultes,et al. A Benchmarking Environment for Reinforcement Learning Based Task Oriented Dialogue Management , 2017, ArXiv.
[23] Zachary Chase Lipton,et al. Efficient Exploration for Dialogue Policy Learning with BBQ Networks & Replay Buffer Spiking , 2016 .
[24] Tanja Schultz,et al. Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers , 2007, HLT-NAACL 2007.
[25] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[26] Stefan Ultes,et al. Sub-domain Modelling for Dialogue Management with Hierarchical Reinforcement Learning , 2017, SIGDIAL Conference.
[27] Lu Chen,et al. Structured Dialogue Policy with Graph Neural Networks , 2018, COLING.
[28] Kam-Fai Wong,et al. Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[30] Hui Ye,et al. Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.
[31] Stefan Ultes,et al. Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management , 2017, SIGDIAL Conference.
[32] Philip S. Yu,et al. A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[33] Jianfeng Gao,et al. BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems , 2016, AAAI.
[34] Radford M. Neal. Annealed importance sampling , 1998, Stat. Comput..
[35] Timothy Baldwin,et al. Semi-supervised User Geolocation via Graph Convolutional Networks , 2018, ACL.
[36] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.
[37] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[38] Kam-Fai Wong,et al. Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning , 2017, EMNLP.
[39] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[40] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[41] Dongho Kim,et al. Distributed dialogue policies for multi-domain statistical dialogue management , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[42] Xiang Zhou,et al. Agent-Aware Dropout DQN for Safe and Efficient On-line Dialogue Policy Learning , 2017, EMNLP.