Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning
暂无分享,去创建一个
Michael I. Jordan | S. Levine | I. Stoica | Joseph Gonzalez | Joseph E. Gonzalez | Vladimir Feinberg | Alvin Wan
[1] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[2] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.
[3] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[4] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[5] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[6] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[7] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[8] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.
[9] Gabriel Kalweit,et al. Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning , 2017, CoRL.
[10] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[11] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[12] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[13] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.
[14] Nikolai Matni,et al. On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.