Safe Reinforcement Learning Using Robust MPC

Reinforcement learning (RL) has recently impressed the world with stunning results in various applications. While the potential of RL is now well established, many critical aspects still need to be tackled, including safety and stability issues. These issues, while secondary for the RL community, are central to the control community that has been widely investigating them. Model predictive control (MPC) is one of the most successful control techniques because, among others, of its ability to provide such guarantees even for uncertain constrained systems. Since MPC is an optimization-based technique, optimality has also often been claimed. Unfortunately, the performance of MPC is highly dependent on the accuracy of the model used for predictions. In this article, we propose to combine RL and MPC in order to exploit the advantages of both, and therefore, obtain a controller that is optimal and safe. We illustrate the results with two numerical examples in simulations.

[1]  J. Rawlings,et al.  Feasibility issues in linear model predictive control , 1999 .

[2]  Mario Zanon,et al.  A tracking MPC formulation that is locally equivalent to economic MPC , 2016 .

[3]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[4]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[5]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[6]  Ryan Murray,et al.  A model for system uncertainty in reinforcement learning , 2018, Syst. Control. Lett..

[7]  Moritz Diehl,et al.  A Lyapunov Function for Economic Optimizing Model Predictive Control , 2011, IEEE Transactions on Automatic Control.

[8]  Mario Zanon,et al.  Economic MPC without terminal constraints: Gradient-correcting end penalties enforce asymptotic stability , 2018 .

[9]  Monika Eisenhower Online Optimization Of Large Scale Systems , 2016 .

[10]  Martin Grötschel,et al.  Online optimization of large scale systems , 2001 .

[11]  F. Lewis,et al.  Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[12]  Mario Zanon,et al.  Indefinite linear MPC and approximated economic MPC for nonlinear systems , 2014 .

[13]  Mario Zanon,et al.  Towards Safe Reinforcement Learning Using NMPC and Policy Gradients: Part II - Deterministic Case , 2019, ArXiv.

[14]  D. Bertsekas,et al.  Recursive state estimation for a set-membership description of uncertainty , 1971 .

[15]  H. Maurer,et al.  Sensitivity Analysis and Real-Time Optimization of Parametric Nonlinear Programming Problems , 2001 .

[16]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[17]  Alberto Bemporad,et al.  Practical Reinforcement Learning of Stabilizing Economic MPC , 2019, 2019 18th European Control Conference (ECC).

[18]  Lars Grüne,et al.  Turnpike Properties and Strict Dissipativity for Discrete Time Linear Quadratic Optimal Control Problems , 2018, SIAM J. Control. Optim..

[19]  Mario Zanon,et al.  Data-Driven Economic NMPC Using Reinforcement Learning , 2019, IEEE Transactions on Automatic Control.

[20]  Torsten Koller,et al.  Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning , 2019, ArXiv.

[21]  Luigi Chisci,et al.  Systems with persistent disturbances: predictive control with restricted constraints , 2001, Autom..

[22]  D. Q. Mayne,et al.  Robust and Stochastic MPC: Are We Going In The Right Direction? , 2015 .

[23]  Roger Fletcher,et al.  Practical methods of optimization; (2nd ed.) , 1987 .

[24]  Angela P. Schoellig,et al.  Robust Constrained Learning-based NMPC enabling reliable mobile robot path tracking , 2016, Int. J. Robotics Res..

[25]  Mario Zanon,et al.  Asymptotic Stability of Economic NMPC: The Importance of Adjoints , 2018 .

[26]  S. Shankar Sastry,et al.  Provably safe and robust learning-based model predictive control , 2011, Autom..

[27]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[28]  W. Art Chaovalitwongse,et al.  Machine Learning Algorithms in Bipedal Robot Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[29]  David Angeli,et al.  Economic optimization using model predictive control with a terminal cost , 2011, Annu. Rev. Control..

[30]  David Q. Mayne,et al.  Model predictive control: Recent developments and future promise , 2014, Autom..

[31]  Lars Grüne,et al.  Economic receding horizon control without terminal constraints , 2013, Autom..

[32]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[33]  E. Gilbert,et al.  Theory and computation of disturbance invariant sets for discrete-time linear systems , 1998 .

[34]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[35]  David Angeli,et al.  On Necessity and Robustness of Dissipativity in Economic Model Predictive Control , 2015, IEEE Transactions on Automatic Control.

[36]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[37]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.