论文信息 - When to Trust Your Model: Model-Based Policy Optimization

When to Trust Your Model: Model-Based Policy Optimization

Designing effective model-based reinforcement learning algorithms is difficult because the ease of data generation must be weighed against the bias of model-generated data. In this paper, we study the role of model usage in policy optimization both theoretically and empirically. We first formulate and analyze a model-based reinforcement learning algorithm with a guarantee of monotonic improvement at each step. In practice, this analysis is overly pessimistic and suggests that real off-policy data is always preferable to model-generated on-policy data, but we show that an empirical estimate of model generalization can be incorporated into such analysis to justify model usage. Motivated by this analysis, we then demonstrate that a simple procedure of using short model-generated rollouts branched from real data has the benefits of more complicated model-based algorithms without the usual pitfalls. In particular, this approach surpasses the sample efficiency of prior model-based methods, matches the asymptotic performance of the best model-free algorithms, and scales to horizons that cause other model-based methods to fail entirely.

[1] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[2] Sebastian Engell,et al. Model Predictive Control Using Neural Networks [25 Years Ago] , 1995, IEEE Control Systems.

[3] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[4] Stefan Schaal,et al. Learning tasks from a single demonstration , 1997, Proceedings of International Conference on Robotics and Automation.

[5] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.

[6] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[7] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8] Sergey Levine,et al. Guided Policy Search , 2013, ICML.

[9] Erik Talvitie,et al. Model Regularization for Stable Sample Rollouts , 2014, UAI.

[10] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .

[11] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[12] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[13] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[14] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[15] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[16] C. Rasmussen,et al. Improving PILCO with Bayesian Neural Network Dynamics Models , 2016 .

[17] Sergey Levine,et al. Optimal control with learned local models: Application to dexterous manipulation , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[18] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.

[19] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.