Improving Sample-Efficiency in Reinforcement Learning for Dialogue Systems by Using Trainable-Action-Mask
暂无分享,去创建一个
[1] Jason Weston,et al. Dialogue Learning With Human-In-The-Loop , 2016, ICLR.
[2] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[3] Kam-Fai Wong,et al. Integrating planning for task-completion dialogue policy learning , 2018, ACL.
[4] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[5] David Vandyke,et al. PyDial: A Multi-domain Statistical Dialogue System Toolkit , 2017, ACL.
[6] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[7] Steve J. Young,et al. The Hidden Agenda User Simulation Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.
[8] Milica Gasic,et al. POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.
[9] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[10] David Vandyke,et al. On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems , 2016, ACL.
[11] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[12] Alborz Geramifard,et al. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.
[13] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[14] Stefan Ultes,et al. A Benchmarking Environment for Reinforcement Learning Based Task Oriented Dialogue Management , 2017, ArXiv.
[15] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.
[16] Jianfeng Gao,et al. Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning , 2018, EMNLP.
[17] Hui Ye,et al. Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.
[18] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[19] Milica Gasic,et al. On-line policy optimisation of spoken dialogue systems via live interaction with human subjects , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[20] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.