Learning Min-norm Stabilizing Control Laws for Systems with Unknown Dynamics

This paper introduces a framework for learning a minimum-norm stabilizing controller for a system with unknown dynamics using model-free policy optimization methods. The approach begins by first designing a Control Lyapunov Function (CLF) for a (possibly inaccurate) dynamics model for the system, along with a function which specifies a minimum acceptable rate of energy dissipation for the CLF at different points in the state-space. Treating the energy dissipation condition as a constraint on the desired closed-loop behavior of the real-world system, we use penalty methods to formulate an unconstrained optimization problem over the parameters of a learned controller, which can be solved using model-free policy optimization algorithms using data collected from the plant. We discuss when the optimization learns a stabilizing controller for the real world system and derive conditions on the structure of the learned controller which ensure that the optimization is strongly convex, meaning the globally optimal solution can be found reliably. We validate the approach in simulation, first for a double pendulum, and then generalize the framework to learn stable walking controllers for underactuated bipedal robots using the Hybrid Zero Dynamics framework. By encoding a large amount of structure into the learning problem, we are able to learn stabilizing controllers for both systems with only minutes or even seconds of training data.

[1]  S. Sastry Nonlinear Systems: Analysis, Stability, and Control , 1999 .

[2]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[3]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[4]  Roland Siegwart,et al.  Control of a Quadrotor With Reinforcement Learning , 2017, IEEE Robotics and Automation Letters.

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Benjamin Recht,et al.  Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[7]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[8]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[9]  Aaron D. Ames,et al.  Episodic Learning with Control Lyapunov Functions for Uncertain Robotic Systems* , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Stefano Battilotti,et al.  Robust stabilization of nonlinear systems with pointwise norm-bounded uncertainties: a control Lyapunov function approach , 1999, IEEE Trans. Autom. Control..

[11]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[12]  Eduardo D. Sontag,et al.  On the Input-to-State Stability Property , 1995, Eur. J. Control.

[13]  Eduardo Sontag A universal construction of Artstein's theorem on nonlinear stabilization , 1989 .

[14]  R. Freeman,et al.  Robust Nonlinear Control Design: State-Space and Lyapunov Techniques , 1996 .

[15]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[16]  Z. Artstein Stabilization with relaxed controls , 1983 .

[17]  Koushil Sreenath,et al.  Torque Saturation in Bipedal Robotic Walking Through Control Lyapunov Function-Based Quadratic Programs , 2013, IEEE Access.

[18]  Koushil Sreenath,et al.  Improving Input-Output Linearizing Controllers for Bipedal Robots via Reinforcement Learning , 2020, L4DC.

[19]  Stefan Schaal,et al.  Automatic LQR tuning based on Gaussian process global optimization , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[20]  F. Lewis,et al.  Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[21]  Paulo Tabuada,et al.  Control Barrier Functions: Theory and Applications , 2019, 2019 18th European Control Conference (ECC).

[22]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[23]  S. Shankar Sastry,et al.  Feedback Linearization for Unknown Systems via Reinforcement Learning , 2019, ArXiv.

[24]  Koushil Sreenath,et al.  Reinforcement Learning for Safety-Critical Control under Model Uncertainty, using Control Lyapunov Functions and Control Barrier Functions , 2020, Robotics: Science and Systems.

[25]  Koushil Sreenath,et al.  Rapidly Exponentially Stabilizing Control Lyapunov Functions and Hybrid Zero Dynamics , 2014, IEEE Transactions on Automatic Control.

[26]  Koushil Sreenath,et al.  Optimal Robust Control for Bipedal Robots through Control Lyapunov Function based Quadratic Programs , 2015, Robotics: Science and Systems.

[27]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[28]  Jaime F. Fisac,et al.  Reachability-based safe learning with Gaussian processes , 2014, 53rd IEEE Conference on Decision and Control.

[29]  Christine Chevallereau,et al.  RABBIT: a testbed for advanced control theory , 2003 .

[30]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[31]  Petter Ögren,et al.  A control Lyapunov function approach to multi-agent coordination , 2001, Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228).

[32]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[33]  Benjamin Recht,et al.  A Tour of Reinforcement Learning: The View from Continuous Control , 2018, Annu. Rev. Control. Robotics Auton. Syst..

[34]  A. Isidori,et al.  Adaptive control of linearizable systems , 1989 .

[35]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[36]  Elisa Capello,et al.  Modeling and experimental parameter identification of a multicopter via a compound pendulum test rig , 2015, 2015 Workshop on Research, Education and Development of Unmanned Aerial Systems (RED-UAS).

[37]  João Pedro Hespanha,et al.  Lyapunov conditions for input-to-state stability of impulsive systems , 2008, Autom..

[38]  Koushil Sreenath,et al.  L1 adaptive control for bipedal robots with control Lyapunov function based quadratic programs , 2015, 2015 American Control Conference (ACC).

[39]  Aaron D. Ames,et al.  Towards the Unification of Locomotion and Manipulation through Control Lyapunov Functions and Quadratic Programs , 2013, CPSW@CISS.