Feedback Linearization for Unknown Systems via Reinforcement Learning

We present a novel approach to control design for nonlinear systems which leverages model-free policy optimization techniques to learn a linearizing controller for a physical plant with unknown dynamics. Feedback linearization is a technique from nonlinear control which renders the input-output dynamics of a nonlinear plant \emph{linear} under application of an appropriate feedback controller. Once a linearizing controller has been constructed, desired output trajectories for the nonlinear plant can be tracked using a variety of linear control techniques. However, the calculation of a linearizing controller requires a precise dynamics model for the system. As a result, model-based approaches for learning exact linearizing controllers generally require a simple, highly structured model of the system with easily identifiable parameters. In contrast, the model-free approach presented in this paper is able to approximate the linearizing controller for the plant using general function approximation architectures. Specifically, we formulate a continuous-time optimization problem over the parameters of a learned linearizing controller whose optima are the set of parameters which best linearize the plant. We derive conditions under which the learning problem is (strongly) convex and provide guarantees which ensure the true linearizing controller for the plant is recovered. We then discuss how model-free policy optimization algorithms can be used to solve a discrete-time approximation to the problem using data collected from the real-world plant. The utility of the framework is demonstrated in simulation and on a real-world robotic platform.

[1]  C. A. Desoer,et al.  Nonlinear Systems Analysis , 1978 .

[2]  A. Isidori,et al.  Adaptive control of linearizable systems , 1989 .

[3]  Elias B. Kosmatopoulos,et al.  A switching adaptive controller for feedback linearizable systems , 1999, IEEE Trans. Autom. Control..

[4]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[5]  Hassan K. Khalil,et al.  Adaptive control of a class of nonlinear discrete-time systems using neural networks , 1995, IEEE Trans. Autom. Control..

[6]  S. Sastry Nonlinear Systems: Analysis, Stability, and Control , 1999 .

[7]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[8]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[9]  Jonathan P. How,et al.  Bayesian Nonparametric Adaptive Control Using Gaussian Processes , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Kevin M. Passino,et al.  Stable adaptive control using fuzzy systems and neural networks , 1996, IEEE Trans. Fuzzy Syst..

[11]  Franck Plestan,et al.  Asymptotically stable walking for biped robots: analysis via systems with impulse effects , 2001, IEEE Trans. Autom. Control..

[12]  Sandra Hirche,et al.  Feedback linearization using Gaussian processes , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[13]  Saif A. Al-Hiddabi,et al.  Quadrotor control using feedback linearization with dynamic extension , 2009, 2009 6th International Symposium on Mechatronics and its Applications.

[14]  Alberto Bemporad,et al.  Predictive Control for Linear and Hybrid Systems , 2017 .

[15]  I. Kanellakopoulos,et al.  Systematic Design of Adaptive Controllers for Feedback Linearizable Systems , 1991, 1991 American Control Conference.

[16]  Herman Bruyninckx,et al.  Open robot control software: the OROCOS project , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[17]  R. E. Kalman,et al.  Contributions to the Theory of Optimal Control , 1960 .

[18]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[19]  Robert M. Sanner,et al.  Gaussian Networks for Direct Adaptive Control , 1991, 1991 American Control Conference.

[20]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[21]  S. Sastry,et al.  Adaptive Control: Stability, Convergence and Robustness , 1989 .

[22]  S. Shankar Sastry,et al.  Adaptive Control of Mechanical Manipulators , 1987, Proceedings. 1986 IEEE International Conference on Robotics and Automation.

[23]  Elias B. Kosmatopoulos,et al.  Robust switching adaptive control of multi-input nonlinear systems , 2002, IEEE Trans. Autom. Control..

[24]  Jonathan P. How,et al.  Bayesian nonparametric adaptive control of time-varying systems using Gaussian processes , 2013, 2013 American Control Conference.

[25]  Sandra Hirche,et al.  Feedback Linearization Based on Gaussian Processes With Event-Triggered Online Learning , 2019, IEEE Transactions on Automatic Control.

[26]  P. Kokotovic,et al.  Feedback linearization of sampled-data systems , 1988 .

[27]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[28]  M.,et al.  Chaos in a double pendulum , 2004 .

[29]  R. Murray,et al.  Flat systems, equivalence and trajectory generation , 2003 .

[30]  Vijay Kumar,et al.  Minimum snap trajectory generation and control for quadrotors , 2011, 2011 IEEE International Conference on Robotics and Automation.

[31]  Frank L. Lewis,et al.  Feedback linearization using neural networks , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[32]  Kwanghee Nam,et al.  A model reference adaptive control scheme for pure-feedback nonlinear systems , 1987, 1987 American Control Conference.

[33]  Charalampos P. Bechlioulis,et al.  Robust Adaptive Control of Feedback Linearizable MIMO Nonlinear Systems With Prescribed Performance , 2008, IEEE Transactions on Automatic Control.

[34]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[35]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[36]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[37]  Li-Xin Wang Stable adaptive fuzzy control of nonlinear systems , 1993, IEEE Trans. Fuzzy Syst..

[38]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[39]  Kao-Shing Hwang,et al.  Reinforcement learning to adaptive control of nonlinear systems , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[40]  Nahum Shimkin,et al.  Nonlinear Control Systems , 2008 .

[41]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[42]  Koushil Sreenath,et al.  Rapidly Exponentially Stabilizing Control Lyapunov Functions and Hybrid Zero Dynamics , 2014, IEEE Transactions on Automatic Control.