Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning

Learning-based methods have been successful in solving complex control tasks without significant prior knowledge about the system. However, these methods typically do not provide any safety guarantees, which prevents their use in safety-critical, real-world applications. In this paper, we present a learning-based model predictive control scheme that can provide provable high-probability safety guarantees. To this end, we exploit regularity assumptions on the dynamics in terms of a Gaussian process prior to construct provably accurate confidence intervals on predicted trajectories. Unlike previous approaches, we do not assume that model uncertainties are independent. Based on these predictions, we guarantee that trajectories satisfy safety constraints. Moreover, we use a terminal set constraint to recursively guarantee the existence of safe control actions at every iteration. In our experiments, we show that the resulting algorithm can be used to safely and efficiently explore and learn about dynamic systems.

[1]  Huibert Kwakernaak,et al.  Linear Optimal Control Systems , 1972 .

[2]  Gene H. Golub,et al.  Matrix computations , 1983 .

[3]  G. Wahba Spline models for observational data , 1990 .

[4]  Leo Breiman,et al.  A deterministic algorithm for global optimization , 1993, Math. Program..

[5]  B. P. Zhang,et al.  Estimation of the Lipschitz constant of a function , 1996, J. Glob. Optim..

[6]  A. Kurzhanski,et al.  Ellipsoidal Calculus for Estimation and Control , 1996 .

[7]  E. Altman Constrained Markov Decision Processes , 1999 .

[8]  O. Bosgra,et al.  Closed-loop stochastic dynamic process optimization under input and state constraints , 2002, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301).

[9]  K. Obermayer,et al.  Multiple-step ahead prediction for non linear dynamic systems: A Gaussian Process treatment with propagation of the uncertainty , 2003, NIPS 2003.

[10]  J. Kocijan,et al.  Gaussian process model based predictive control , 2004, Proceedings of the 2004 American Control Conference.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Jonathan P. How,et al.  Robust variable horizon model predictive control for vehicle maneuvering , 2006 .

[13]  Lorenz T. Biegler,et al.  On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming , 2006, Math. Program..

[14]  E. Bronstein Approximation of convex sets by polytopes , 2008 .

[15]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[16]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[17]  Louis Wehenkel,et al.  Reinforcement Learning Versus Model Predictive Control: A Comparison on a Power System Problem , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[18]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[19]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[20]  Javier García,et al.  Safe Exploration of State and Action Spaces in Reinforcement Learning , 2012, J. Artif. Intell. Res..

[21]  Richard M. Murray,et al.  A robust model predictive control algorithm augmented with a reactive safety mode , 2013, Autom..

[22]  Joel Andersson,et al.  A General-Purpose Software Framework for Dynamic Optimization (Een algemene softwareomgeving voor dynamische optimalisatie) , 2013 .

[23]  S. Shankar Sastry,et al.  Provably safe and robust learning-based model predictive control , 2011, Autom..

[24]  Olaf Stursberg,et al.  Control of Uncertain Nonlinear Systems Using Ellipsoidal Reachability Calculus , 2013, NOLCOS.

[25]  Torkel Glad,et al.  Nonlinear model predictive control using Feedback Linearization and local inner convex constraint approximations , 2013, 2013 European Control Conference (ECC).

[26]  Jaime F. Fisac,et al.  Reachability-based safe learning with Gaussian processes , 2014, 53rd IEEE Conference on Decision and Control.

[27]  Khadir Mohamed,et al.  Model Predictive Control: Theory and Design , 2014 .

[28]  Martin A. Riedmiller,et al.  Approximate real-time optimal control based on sparse Gaussian process models , 2014, 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[29]  Angelika Bayer,et al.  Ellipsoidal Calculus For Estimation And Control , 2016 .

[30]  Duy Nguyen-Tuong,et al.  Stability of Controllers for Gaussian Process Forward Models , 2016, ICML.

[31]  Andreas Krause,et al.  Safe learning of regions of attraction for uncertain, nonlinear systems with Gaussian processes , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[32]  Angela P. Schoellig,et al.  Robust Constrained Learning-based NMPC enabling reliable mobile robot path tracking , 2016, Int. J. Robotics Res..

[33]  Sergey Levine,et al.  Model-based reinforcement learning with parametrized physical models and optimism-driven exploration , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Calin Belta,et al.  A provably correct MPC approach to safety control of urban traffic networks , 2016, 2016 American Control Conference (ACC).

[35]  Fakhrul Alam,et al.  Gaussian Process Model Predictive Control of an Unmanned Quadrotor , 2016, Journal of Intelligent & Robotic Systems.

[36]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[37]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[38]  Tatiana F. Filippova,et al.  Ellipsoidal Estimates of Reachable Sets for Control Systems with Nonlinear Terms , 2017 .

[39]  Marc Peter Deisenroth,et al.  Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control , 2017, AISTATS.

[40]  Frank Allgöwer,et al.  Learning-Based Robust Model Predictive Control with State-Dependent Uncertainty , 2018 .

[41]  Yuval Tassa,et al.  Safe Exploration in Continuous Action Spaces , 2018, ArXiv.

[42]  Vijay Kumar,et al.  Approximating Explicit Model Predictive Control Using Constrained Neural Networks , 2018, 2018 Annual American Control Conference (ACC).

[43]  Kim Peter Wabersich,et al.  Linear Model Predictive Safety Certification for Learning-Based Control , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[44]  Kim Peter Wabersich,et al.  Safe exploration of nonlinear dynamical systems: A predictive safety filter for reinforcement learning , 2018, ArXiv.

[45]  Manfred Morari,et al.  Learning and control using gaussian processes: towards bridging machine learning and controls for physical systems , 2018, ICCPS.

[46]  Alexander Liniger,et al.  Cautious NMPC with Gaussian Process Dynamics for Autonomous Miniature Race Cars , 2017, 2018 European Control Conference (ECC).

[47]  Mohammad Ghavamzadeh,et al.  Lyapunov-based Safe Policy Optimization for Continuous Control , 2019, ArXiv.

[48]  Nikolai Matni,et al.  Safely Learning to Control the Constrained Linear Quadratic Regulator , 2018, 2019 American Control Conference (ACC).

[49]  Francesco Borrelli,et al.  Sample-Based Learning Model Predictive Control for Linear Uncertain Systems , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[50]  Juraj Kabzan,et al.  Cautious Model Predictive Control Using Gaussian Process Regression , 2017, IEEE Transactions on Control Systems Technology.

[51]  Andreas Krause,et al.  Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics , 2016, Machine Learning.