Learning-based Model Predictive Control for Safe Reinforcement Learning

Reinforcement learning has been successfully used to solve difficult tasks in complex unknown environments. However, these methods typically do not provide any safety guarantees, which prevents their use in safety-critical, realworld applications. In this paper, we attempt to bridge the gap between learning-based techniques that are scalable and highly autonomous but often unsafe and robust control techniques, which have a solid theoretical foundation that guarantees safety but often require extensive expert knowledge to identify the system and estimate disturbance sets. We combine a provably safe learning-based MPC scheme that allows for input-dependent uncertainties with techniques from model-based RL to solve tasks with only limited prior knowledge. We evaluate the resulting algorithm to solve a reinforcement learning task in a simulated cart-pole dynamical system with safety constraints.

[1]  Juraj Kabzan,et al.  Cautious Model Predictive Control Using Gaussian Process Regression , 2017, IEEE Transactions on Control Systems Technology.

[2]  Torsten Koller,et al.  Learning-Based Model Predictive Control for Safe Exploration , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[5]  S. Shankar Sastry,et al.  Provably safe and robust learning-based model predictive control , 2011, Autom..

[6]  Frank Allgöwer,et al.  Learning-Based Robust Model Predictive Control with State-Dependent Uncertainty , 2018 .

[7]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[8]  Sergey Levine,et al.  Model-based reinforcement learning with parametrized physical models and optimism-driven exploration , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Kim Peter Wabersich,et al.  Safe exploration of nonlinear dynamical systems: A predictive safety filter for reinforcement learning , 2018, ArXiv.

[10]  O. Bosgra,et al.  Closed-loop stochastic dynamic process optimization under input and state constraints , 2002, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301).

[11]  K. Obermayer,et al.  Multiple-step ahead prediction for non linear dynamic systems: A Gaussian Process treatment with propagation of the uncertainty , 2003, NIPS 2003.