暂无分享,去创建一个
[1] Thomas G. Dietterich,et al. In Advances in Neural Information Processing Systems 12 , 1991, NIPS 1991.
[2] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[3] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[4] Erik Talvitie,et al. Model Regularization for Stable Sample Rollouts , 2014, UAI.
[5] Kam-Fai Wong,et al. Integrating planning for task-completion dialogue policy learning , 2018, ACL.
[6] Daan Wierstra,et al. Recurrent Environment Simulators , 2017, ICLR.
[7] Sergey Levine,et al. Recall Traces: Backtracking Models for Efficient Reinforcement Learning , 2018, ICLR.
[8] Martha White,et al. Organizing Experience: a Deeper Look at Replay Mechanisms for Sample-Based Planning in Continuous State Domains , 2018, IJCAI.
[9] Marc G. Bellemare,et al. Skip Context Tree Switching , 2014, ICML.
[10] Longxin Lin. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.
[11] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.
[12] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.
[13] Doina Precup,et al. Dyna Planning using a Feature Based Generative Model , 2018, ArXiv.
[14] Ben Calderhead,et al. Advances in Neural Information Processing Systems 29 , 2016 .
[15] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[16] V. Kaul,et al. Planning , 2012 .
[17] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[18] Kilian Q. Weinberger,et al. Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 , 2016 .
[19] John Langford,et al. Proceedings of the 29th International Conference on Machine Learning (ICML-12) , 2012, ArXiv.
[20] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[21] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.
[22] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[23] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[24] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[25] Peter A. Flach,et al. Proceedings of the 28th International Conference on Machine Learning , 2011 .
[26] Shalabh Bhatnagar,et al. Multi-Step Dyna Planning for Policy Evaluation and Control , 2009, NIPS.
[27] Gabriel Kalweit,et al. Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning , 2017, CoRL.
[28] Alborz Geramifard,et al. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.
[29] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[30] J. Andrew Bagnell,et al. Agnostic System Identification for Model-Based Reinforcement Learning , 2012, ICML.
[31] Marc G. Bellemare,et al. Bayesian Learning of Recursively Factored Environments , 2013, ICML.
[32] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[33] Ram Ramamoorthy,et al. Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence , 2014 .
[34] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[35] Marlos C. Machado,et al. State of the Art Control of Atari Games Using Shallow Reinforcement Learning , 2015, AAMAS.
[36] Jean-Arcady Meyer,et al. Adaptive Behavior , 2005 .
[37] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[38] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[39] Peter A. Flach,et al. Advances in Neural Information Processing Systems 28 , 2015 .
[40] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[41] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[42] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[43] J. Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, IEEE International Conference on Neural Networks.