暂无分享,去创建一个
[1] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[2] Sergey Levine,et al. Model-Based Reinforcement Learning for Atari , 2019, ICLR.
[3] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[4] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[5] Sergey Levine,et al. When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.
[6] Tengyu Ma,et al. Fixup Initialization: Residual Learning Without Normalization , 2019, ICLR.
[7] Erik Talvitie,et al. Model Regularization for Stable Sample Rollouts , 2014, UAI.
[8] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.
[9] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[10] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[11] Yuandong Tian,et al. Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees , 2018, ICLR.
[12] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[13] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[14] Nan Jiang,et al. Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches , 2018, COLT.
[15] Nikolai Matni,et al. Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator , 2018, NeurIPS.
[16] Honglak Lee,et al. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.
[17] Nikolai Matni,et al. On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.
[18] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[19] Yoshua Bengio,et al. Probabilistic Planning with Sequential Monte Carlo methods , 2018, ICLR.
[20] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[21] Balaraman Ravindran,et al. EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.
[22] Tamim Asfour,et al. Model-Based Reinforcement Learning via Meta-Policy Optimization , 2018, CoRL.
[23] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[24] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.
[25] Yilun Du,et al. Task-Agnostic Dynamics Priors for Deep Reinforcement Learning , 2019, ICML.
[26] Jimmy Ba,et al. Exploring Model-based Planning with Policy Networks , 2019, ICLR.
[27] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[28] Razvan Pascanu,et al. On the number of inference regions of deep feed forward networks with piece-wise linear activations , 2013, ICLR.
[29] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[30] Michael I. Jordan,et al. Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.
[31] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[32] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.
[33] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[34] Sergey Levine,et al. Diagnosing Bottlenecks in Deep Q-learning Algorithms , 2019, ICML.
[35] Benjamin Recht,et al. The Gap Between Model-Based and Model-Free Methods on the Linear Quadratic Regulator: An Asymptotic Viewpoint , 2018, COLT.
[36] Stefano Ermon,et al. Calibrated Model-Based Deep Reinforcement Learning , 2019, ICML.
[37] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[38] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[39] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[40] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[41] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.