Technichal Report: Adaptive Control for Linearizable Systems Using On-Policy Reinforcement Learning

This paper proposes a framework for adaptively learning a feedback linearization-based tracking controller for an unknown system using discrete-time model-free policy-gradient parameter update rules. The primary advantage of the scheme over standard model-reference adaptive control techniques is that it does not require the learned inverse model to be invertible at all instances of time. This enables the use of general function approximators to approximate the linearizing controller for the system without having to worry about singularities. The overall learning system is stochastic, due to the random nature of the policy gradient updates, thus we combine analysis techniques commonly employed in the machine learning literature alongside stability arguments from adaptive control to demonstrate that with high probability the tracking and parameter errors concentrate near zero, under a standard persistency of excitation condition. A simulated example of a double pendulum demonstrates the utility of the proposed theory.

[1]  Charalampos P. Bechlioulis,et al.  Robust Adaptive Control of Feedback Linearizable MIMO Nonlinear Systems With Prescribed Performance , 2008, IEEE Transactions on Automatic Control.

[2]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[3]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.

[4]  S. Shankar Sastry,et al.  Adaptive Control of Mechanical Manipulators , 1987, Proceedings. 1986 IEEE International Conference on Robotics and Automation.

[5]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[6]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[7]  Kao-Shing Hwang,et al.  Reinforcement learning to adaptive control of nonlinear systems , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[8]  Girish Chowdhary,et al.  Concurrent learning for convergence in adaptive control without persistency of excitation , 2010, 49th IEEE Conference on Decision and Control (CDC).

[9]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[10]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[11]  P. Kokotovic,et al.  Feedback linearization of sampled-data systems , 1988 .

[12]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[13]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[14]  Maria Adler,et al.  Stable Adaptive Systems , 2016 .

[15]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[16]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[17]  Elias B. Kosmatopoulos,et al.  Robust switching adaptive control of multi-input nonlinear systems , 2002, IEEE Trans. Autom. Control..

[18]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[19]  Gang Niu,et al.  Analysis and Improvement of Policy Gradient Estimation , 2011, NIPS.

[20]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[21]  Elijah Polak,et al.  Optimization: Algorithms and Consistent Approximations , 1997 .

[22]  S. Sastry Nonlinear Systems: Analysis, Stability, and Control , 1999 .

[23]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[24]  Lakmal Seneviratne,et al.  Adaptive Control Of Robot Manipulators , 1992, Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25]  Sandra Hirche,et al.  Feedback Linearization Based on Gaussian Processes With Event-Triggered Online Learning , 2019, IEEE Transactions on Automatic Control.

[26]  M.,et al.  Chaos in a double pendulum , 2004 .

[27]  A. Isidori,et al.  Adaptive control of linearizable systems , 1989 .

[28]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[29]  S. Shankar Sastry,et al.  Feedback Linearization for Unknown Systems via Reinforcement Learning , 2019, ArXiv.

[30]  Rémi Munos,et al.  Policy Gradient in Continuous Time , 2006, J. Mach. Learn. Res..

[31]  Albert Y. Zomaya Reinforcement learning for the adaptive control of nonlinear systems , 1994 .

[32]  Peter L. Bartlett,et al.  Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[33]  Elias B. Kosmatopoulos,et al.  A switching adaptive controller for feedback linearizable systems , 1999, IEEE Trans. Autom. Control..

[34]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[35]  Sandra Hirche,et al.  Feedback linearization using Gaussian processes , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[36]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[37]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[38]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[39]  Frank L. Lewis,et al.  Multilayer neural-net robot controller with guaranteed tracking performance , 1996, IEEE Trans. Neural Networks.