论文信息 - Reinforcement Learning for Safety-Critical Control under Model Uncertainty, using Control Lyapunov Functions and Control Barrier Functions

Reinforcement Learning for Safety-Critical Control under Model Uncertainty, using Control Lyapunov Functions and Control Barrier Functions

In this paper, the issue of model uncertainty in safety-critical control is addressed with a data-driven approach. For this purpose, we utilize the structure of an input-ouput linearization controller based on a nominal model along with a Control Barrier Function and Control Lyapunov Function based Quadratic Program (CBF-CLF-QP). Specifically, we propose a novel reinforcement learning framework which learns the model uncertainty present in the CBF and CLF constraints, as well as other control-affine dynamic constraints in the quadratic program. The trained policy is combined with the nominal model-based CBF-CLF-QP, resulting in the Reinforcement Learning-based CBF-CLF-QP (RL-CBF-CLF-QP), which addresses the problem of model uncertainty in the safety constraints. The performance of the proposed method is validated by testing it on an underactuated nonlinear bipedal robot walking on randomly spaced stepping stones with one step preview, obtaining stable and safe walking under model uncertainty.

[1] Koushil Sreenath,et al. Optimal Robust Time-Varying Safety-Critical Control With Application to Dynamic Walking on Moving Stepping Stones , 2016 .

[2] S. Sastry. Nonlinear Systems: Analysis, Stability, and Control , 1999 .

[3] Sergey Levine,et al. Goal-driven dynamics learning via Bayesian optimization , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[4] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[5] Andreas Krause,et al. The Lyapunov Neural Network: Adaptive Stability Certification for Safe Learning of Dynamical Systems , 2018, CoRL.

[6] Aaron D. Ames,et al. FROST∗: Fast robot optimization and simulation toolkit , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7] Aaron D. Ames,et al. Towards the Unification of Locomotion and Manipulation through Control Lyapunov Functions and Quadratic Programs , 2013, CPSW@CISS.

[8] Sandra Hirche,et al. An Uncertainty-Based Control Lyapunov Approach for Control-Affine Systems Modeled by Gaussian Process , 2018, IEEE Control Systems Letters.

[9] S. Shankar Sastry,et al. Feedback Linearization for Unknown Systems via Reinforcement Learning , 2019, ArXiv.

[10] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[11] Christine Chevallereau,et al. RABBIT: a testbed for advanced control theory , 2003 .

[12] Jaime F. Fisac,et al. A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems , 2017, IEEE Transactions on Automatic Control.

[13] Aaron D. Ames,et al. Episodic Learning with Control Lyapunov Functions for Uncertain Robotic Systems* , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14] Andreas Krause,et al. Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[15] Koushil Sreenath,et al. L1 adaptive control for bipedal robots with control Lyapunov function based quadratic programs , 2015, 2015 American Control Conference (ACC).

[16] Joonho Lee,et al. Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[17] Yisong Yue,et al. Learning for Safety-Critical Control with Control Barrier Functions , 2019, L4DC.

[18] Koushil Sreenath,et al. Exponential Control Barrier Functions for enforcing high relative-degree safety-critical constraints , 2016, 2016 American Control Conference (ACC).

[19] Paulo Tabuada,et al. Control barrier function based quadratic programs with application to adaptive cruise control , 2014, 53rd IEEE Conference on Decision and Control.

[20] Koushil Sreenath,et al. Torque Saturation in Bipedal Robotic Walking Through Control Lyapunov Function-Based Quadratic Programs , 2013, IEEE Access.

[21] Koushil Sreenath,et al. Improving Input-Output Linearizing Controllers for Bipedal Robots via Reinforcement Learning , 2020, L4DC.

[22] Koushil Sreenath,et al. Rapidly Exponentially Stabilizing Control Lyapunov Functions and Hybrid Zero Dynamics , 2014, IEEE Transactions on Automatic Control.