Combining Model-Based Design and Model-Free Policy Optimization to Learn Safe, Stabilizing Controllers

This paper introduces a framework for learning a safe, stabilizing controller for a system with unknown dynamics using model-free policy optimization algorithms. Using a nominal dynamics model, the user specifies a candidate Control Lyapunov Function (CLF) around the desired operating point, and specifies the desired safe-set using a Control Barrier Function (CBF). Using penalty methods from the optimization literature, we then develop a family of policy optimization problems which attempt to minimize control effort while satisfying the pointwise constraints used to specify the CLF and CBF. We demonstrate that when the penalty terms are scaled correctly, the optimization prioritizes the maintenance of safety over stability, and stability over optimality. We discuss how standard reinforcement learning algorithms can be applied to the problem, and validate the approach through simulation. We then illustrate how the approach can be applied to a class of hybrid models commonly used in the dynamic walking literature, and use it to learn safe, stable walking behavior over a randomly spaced sequence of stepping stones.

[1]  Jaime F. Fisac,et al.  Reachability-based safe learning with Gaussian processes , 2014, 53rd IEEE Conference on Decision and Control.

[2]  Paulo Tabuada,et al.  Robustness of Control Barrier Functions for Safety Critical Control , 2016, ADHS.

[3]  Robert M. Sanner,et al.  Gaussian Networks for Direct Adaptive Control , 1991, 1991 American Control Conference.

[4]  Yisong Yue,et al.  Learning for Safety-Critical Control with Control Barrier Functions , 2019, L4DC.

[5]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[6]  Koushil Sreenath,et al.  Robust Safety-Critical Control for Dynamic Robotics , 2022, IEEE Transactions on Automatic Control.

[7]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[8]  Koushil Sreenath,et al.  Exponential Control Barrier Functions for enforcing high relative-degree safety-critical constraints , 2016, 2016 American Control Conference (ACC).

[9]  Paulo Tabuada,et al.  Control Barrier Function Based Quadratic Programs for Safety Critical Systems , 2016, IEEE Transactions on Automatic Control.

[10]  Roland Siegwart,et al.  Control of a Quadrotor With Reinforcement Learning , 2017, IEEE Robotics and Automation Letters.

[11]  Koushil Sreenath,et al.  Learning Min-norm Stabilizing Control Laws for Systems with Unknown Dynamics , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[12]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[13]  Koushil Sreenath,et al.  Safety-Critical Control for Dynamical Bipedal Walking with Precise Footstep Placement , 2015, ADHS.

[14]  Christine Chevallereau,et al.  Models, feedback control, and open problems of 3D bipedal robotic walking , 2014, Autom..

[15]  Koushil Sreenath,et al.  Reinforcement Learning for Safety-Critical Control under Model Uncertainty, using Control Lyapunov Functions and Control Barrier Functions , 2020, Robotics: Science and Systems.

[16]  Koushil Sreenath,et al.  Rapidly Exponentially Stabilizing Control Lyapunov Functions and Hybrid Zero Dynamics , 2014, IEEE Transactions on Automatic Control.

[17]  S. Sastry,et al.  Adaptive Control: Stability, Convergence and Robustness , 1989 .

[18]  Eduardo Sontag A universal construction of Artstein's theorem on nonlinear stabilization , 1989 .