Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning
暂无分享,去创建一个
Yiming Yang | Jianfeng Gao | Xiujun Li | Yuexin Wu | Jingjing Liu | Xiujun Li | Jianfeng Gao | Yiming Yang | Yuexin Wu | Jingjing Liu
[1] Zachary Chase Lipton,et al. Efficient Exploration for Dialogue Policy Learning with BBQ Networks & Replay Buffer Spiking , 2016 .
[2] Tsuyoshi Murata,et al. {m , 1934, ACML.
[3] Helen F. Hastie,et al. A survey on metrics for the evaluation of user simulations , 2012, The Knowledge Engineering Review.
[4] Danna Zhou,et al. d. , 1934, Microbial pathogenesis.
[5] Xiaodong Liu,et al. Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval , 2015, NAACL.
[6] Gökhan Tür,et al. Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM , 2016, INTERSPEECH.
[7] François Laviolette,et al. Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..
[8] David Vandyke,et al. Continuously Learning Neural Dialogue Management , 2016, ArXiv.
[9] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[10] Jianfeng Gao,et al. Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access , 2016, ACL.
[11] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[12] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[13] Lihong Li,et al. Neural Approaches to Conversational AI , 2019, Found. Trends Inf. Retr..
[14] Tsung-Hsien Wen,et al. Neural Belief Tracker: Data-Driven Dialogue State Tracking , 2016, ACL.
[15] Jianfeng Gao,et al. BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems , 2016, AAAI.
[16] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[17] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[18] Jianfeng Gao,et al. A User Simulator for Task-Completion Dialogues , 2016, ArXiv.
[19] David Vandyke,et al. Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.
[20] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[21] Hui Ye,et al. Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.
[22] Jianfeng Gao,et al. Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning , 2018, EMNLP.
[23] Roberto Pieraccini,et al. Learning dialogue strategies within the Markov decision process framework , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.
[24] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[25] Kam-Fai Wong,et al. Integrating planning for task-completion dialogue policy learning , 2018, ACL.
[26] Benjamin Van Roy,et al. A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..
[27] Milica Gasic,et al. POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.
[28] Jianfeng Gao,et al. End-to-End Task-Completion Neural Dialogue Systems , 2017, IJCNLP.
[29] MarchandMario,et al. Domain-adversarial training of neural networks , 2016 .