Actor-Double-Critic: Incorporating Model-Based Critic for Task-Oriented Dialogue Systems
暂无分享,去创建一个
[1] David Vandyke,et al. Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.
[2] Stefan Ultes,et al. A Benchmarking Environment for Reinforcement Learning Based Task Oriented Dialogue Management , 2017, ArXiv.
[3] Hui Ye,et al. Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.
[4] Jianfeng Gao,et al. Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning , 2018, EMNLP.
[5] Jason D. Williams,et al. The best of both worlds: unifying conventional dialog systems and POMDPs , 2008, INTERSPEECH.
[6] Xiang Zhou,et al. Affordable On-line Dialogue Policy Learning , 2017, EMNLP.
[7] Stefan Ultes,et al. Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management , 2017, SIGDIAL Conference.
[8] Alborz Geramifard,et al. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.
[9] Jing He,et al. Policy Networks with Two-Stage Training for Dialogue Systems , 2016, SIGDIAL Conference.
[10] Roberto Pieraccini,et al. Learning dialogue strategies within the Markov decision process framework , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.
[11] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[12] Dilek Z. Hakkani-Tür,et al. Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems , 2018, NAACL.
[13] Jason Weston,et al. Dialogue Learning With Human-In-The-Loop , 2016, ICLR.
[14] Enhong Chen,et al. Budgeted Policy Learning for Task-Oriented Dialogue Systems , 2019, ACL.
[15] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[16] Milica Gasic,et al. On-line policy optimisation of spoken dialogue systems via live interaction with human subjects , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[17] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[18] Milica Gasic,et al. Gaussian Processes for Fast Policy Optimisation of POMDP-based Dialogue Managers , 2010, SIGDIAL Conference.
[19] Kam-Fai Wong,et al. Integrating planning for task-completion dialogue policy learning , 2018, ACL.
[20] Matthew Henderson,et al. N-best error simulation for training spoken dialogue systems , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).
[21] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.
[22] David Vandyke,et al. PyDial: A Multi-domain Statistical Dialogue System Toolkit , 2017, ACL.
[23] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[24] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.
[25] Tsung-Hsien Wen,et al. Neural Belief Tracker: Data-Driven Dialogue State Tracking , 2016, ACL.
[26] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[27] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[28] Lihong Li,et al. Neural Approaches to Conversational AI , 2019, Found. Trends Inf. Retr..
[29] Jianfeng Gao,et al. Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access , 2016, ACL.
[30] David Vandyke,et al. On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems , 2016, ACL.
[31] Minlie Huang,et al. Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog , 2019, EMNLP.
[32] Yen-Chen Wu,et al. Improving Sample-Efficiency in Reinforcement Learning for Dialogue Systems by Using Trainable-Action-Mask , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Steve J. Young,et al. The Hidden Agenda User Simulation Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.
[34] Milica Gasic,et al. POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.
[35] Yiming Yang,et al. Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning , 2018, AAAI.
[36] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.