Improving the Robustness of Reinforcement Learning Policies with L1 Adaptive Control

A reinforcement learning (RL) control policy trained in a nominal environment could fail in a new/perturbed environment due to the existence of dynamic variations. For controlling systems with continuous state and action spaces, we propose an add-on approach to robustifying a pre-trained RL policy by augmenting it with an L1 adaptive controller (L1AC). Leveraging the capability of an L1AC for fast estimation and active compensation of dynamic variations, the proposed approach can improve the robustness of an RL policy which is trained either in a simulator or in the real world without consideration of a broad class of dynamic variations. Numerical and real-world experiments empirically demonstrate the efficacy of the proposed approach in robustifying RL policies trained using both model-free and model-based methods. A video for the experiments on a real Pendubot setup is available at https://youtu.be/xgOB9vpyUgE.

[1]  Yunpeng Pan,et al.  Probabilistic Differential Dynamic Programming , 2014, NIPS.

[2]  Naira Hovakimyan,et al.  Safe Feedback Motion Planning: A Contraction Theory and ℒ1-Adaptive Control Based Approach , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[3]  Emanuel Todorov,et al.  Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.

[4]  Yuval Tassa,et al.  Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  J. Zico Kolter,et al.  Overfitting in adversarially robust deep learning , 2020, ICML.

[6]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[7]  Anuradha M. Annaswamy,et al.  Robust Adaptive Control , 1984, 1984 American Control Conference.

[8]  Naira Hovakimyan,et al.  L1 Adaptive Control Theory - Guaranteed Robustness with Fast Adaptation , 2010, Advances in design and control.

[9]  Katja Hofmann,et al.  Meta Reinforcement Learning with Latent Variable Gaussian Processes , 2018, UAI.

[10]  J. Doyle,et al.  Essentials of Robust Control , 1997 .

[11]  Mark W. Spong,et al.  Mechanical Design and Control of the Pendubot , 1995 .

[12]  F. Lewis Adaptive Control Theory : Guaranteed Robustness with Fast Adaptation , 2016 .

[13]  Vladlen Koltun,et al.  Deep Drone Racing: Learning Agile Flight in Dynamic Environments , 2018, CoRL.

[14]  Lei Guo,et al.  Disturbance-Observer-Based Control and Related Methods—An Overview , 2016, IEEE Transactions on Industrial Electronics.

[15]  Roland Siegwart,et al.  Design and control of an indoor micro quadrotor , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[16]  Adaptive Control of Linear Parameter-Varying Systems with Unmatched Uncertainties , 2020, ArXiv.

[17]  Hyungbo Shim,et al.  On Improving the Robustness of Reinforcement Learning-based Controllers using Disturbance Observer , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[18]  Irene M. Gregory,et al.  ℒ1 Adaptive Control for Safety-Critical Systems: Guaranteed robustness with fast adaptation , 2011 .

[19]  Masaki Yamakita,et al.  Robust Control Barrier Function for Systems Affected by a Class of Mismatched Disturbances , 2020, SICE Journal of Control Measurement and System Integration.

[20]  Vladlen Koltun,et al.  Deep Drone Racing: From Simulation to Reality With Domain Randomization , 2019, IEEE Transactions on Robotics.

[21]  Marc Peter Deisenroth,et al.  PILCO Code Documentation v0.9 , 2013 .

[22]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[23]  Kasey A. Ackerman,et al.  ℒ1-Adaptive MPPI Architecture for Robust and Agile Control of Multirotors , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  Sergey Levine,et al.  Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL , 2018, ICLR.

[25]  Massimo Franceschetti,et al.  Probabilistic Safety Constraints for Learned High Relative Degree System Dynamics , 2020, L4DC.

[26]  Anuradha Annaswamy,et al.  MRAC-RL: A Framework for On-Line Policy Adaptation Under Parametric Model Uncertainty , 2020, ArXiv.

[27]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[28]  Naira Hovakimyan,et al.  Adaptive Robust Quadratic Programs using Control Lyapunov and Barrier Functions , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[29]  Naira Hovakimyan,et al.  Recovery of desired flying characteristics with an l1 adaptive control law: Flight test results on calspan’s vss learjet , 2019 .

[30]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[31]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[32]  Haitham Bou-Ammar,et al.  Wasserstein Robust Reinforcement Learning , 2019, ArXiv.

[33]  Marco Pavone,et al.  Tube-Certified Trajectory Tracking for Nonlinear Systems With Robust Control Contraction Metrics , 2021, ArXiv.

[34]  Sergey Levine,et al.  Learning Complex Neural Network Policies with Trajectory Optimization , 2014, ICML.

[35]  Wenhao Ding,et al.  Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of Gaussian Processes , 2020, NeurIPS.

[36]  Davide Scaramuzza,et al.  Autonomous Drone Racing with Deep Reinforcement Learning , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[37]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[38]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[39]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[40]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.