Online inverse reinforcement learning for nonlinear systems with adversarial attacks

In the inverse reinforcement learning (RL) problem, there are two agents. A learner agent seeks to mimic another expert agent's state and control input behavior trajectories by observing the expert's behavior trajectories. These observations are used to reconstruct the unknown expert's performance objective. This article develops novel inverse RL algorithms to solve the inverse RL problem in which both agents suffer from adversarial attacks and have continuous‐time nonlinear dynamics. We first propose an offline inverse RL algorithm for the learner to reconstruct unknown expert's performance objective. This offline inverse RL algorithm is based on the technique of integral RL (IRL) and only needs partial knowledge of the system dynamics. The algorithm has two learning stages: an optimal control learning stage first and a second learning stage based on inverse optimal control. Then, based on the offline algorithm, an online inverse RL algorithm is further developed to solve the inverse RL problem in real time without knowing the system drift dynamics. This online adaptive learning method consists of simultaneous adaptation of four neural networks (NNs): a critic NN, an actor NN, an adversary NN, and a state penalty NN. Convergence of the algorithms as well as the stability of the learner system and the synchronous tuning NNs are guaranteed. Simulation examples verify the effectiveness of the online method.

[1]  J. Na,et al.  Composite-Learning-Based Adaptive Neural Control for Dual-Arm Robots With Relative Motion , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Yin Yang,et al.  Policy Iteration Q-Learning for Data-Based Two-Player Zero-Sum Game of Linear Discrete-Time Systems , 2020, IEEE Transactions on Cybernetics.

[3]  Frank L. Lewis,et al.  Cooperative adaptive optimal output regulation of nonlinear discrete-time multi-agent systems , 2020, Autom..

[4]  Kyriakos G. Vamvoudakis,et al.  Safe reinforcement learning for dynamical games , 2020, International Journal of Robust and Nonlinear Control.

[5]  R. Kamalapurkar,et al.  Online inverse reinforcement learning for systems with disturbances , 2020, 2020 American Control Conference (ACC).

[6]  Tristan Perez,et al.  Inverse Open-Loop Noncooperative Differential Games and Inverse Optimal Control , 2020, IEEE Transactions on Automatic Control.

[7]  Zhengtao Ding,et al.  Adaptive Optimal Control for a Class of Nonlinear Systems: The Online Policy Iteration Approach , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Joel W. Burdick,et al.  Human motion analysis in medical robotics via high-dimensional inverse reinforcement learning , 2020, Int. J. Robotics Res..

[9]  Xiaoming Hu,et al.  Inverse optimal control for discrete-time finite-horizon Linear Quadratic Regulators , 2019, Autom..

[10]  Lantao Yu,et al.  Multi-Agent Adversarial Inverse Reinforcement Learning , 2019, ICML.

[11]  Dimitar Filev,et al.  Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning , 2019, Robotics Auton. Syst..

[12]  Stuart J. Russell,et al.  Inverse reinforcement learning for video games , 2018, ArXiv.

[13]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[14]  Tristan Perez,et al.  Finite-horizon inverse optimal control for discrete-time nonlinear systems , 2018, Autom..

[15]  Guang Li,et al.  A Brief Review of Neural Networks Based Learning and Control and Their Applications for Robots , 2017, Complex..

[16]  Sören Hohmann,et al.  Inverse Optimal Control for Identification in Non-Cooperative Differential Games , 2017 .

[17]  Nicholas Rhinehart,et al.  First-Person Activity Forecasting with Online Inverse Reinforcement Learning , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Ying Tan,et al.  Learning control in robot-assisted rehabilitation of motor skills - a review , 2016, J. Control. Decis..

[19]  Huaguang Zhang,et al.  Distributed Cooperative Optimal Control for Multiagent Systems on Directed Graphs: An Inverse Optimal Approach , 2015, IEEE Transactions on Cybernetics.

[20]  F. Lewis,et al.  Online adaptive algorithm for optimal control with integral reinforcement learning , 2014 .

[21]  Frank L. Lewis,et al.  Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning , 2014, Autom..

[22]  Edgar N. Sanchez,et al.  Discrete-Time Inverse Optimal Control for Nonlinear Systems , 2013 .

[23]  Derong Liu,et al.  Finite-Approximation-Error-Based Optimal Control Approach for Discrete-Time Nonlinear Systems , 2013, IEEE Transactions on Cybernetics.

[24]  Frank L. Lewis,et al.  Optimal Control: Lewis/Optimal Control 3e , 2012 .

[25]  Frank L. Lewis,et al.  Online solution of nonlinear two‐player zero‐sum games using synchronous policy iteration , 2012 .

[26]  Jean-Paul Laumond,et al.  From human to humanoid locomotion—an inverse optimal control approach , 2010, Auton. Robots.

[27]  Frank L. Lewis,et al.  2009 Special Issue: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems , 2009 .

[28]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[29]  Victor M. Becerra,et al.  Optimal control , 2008, Scholarpedia.

[30]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[31]  Frank L. Lewis,et al.  Neurodynamic Programming and Zero-Sum Games for Constrained Control Systems , 2008, IEEE Transactions on Neural Networks.

[32]  W. Haddad,et al.  Nonlinear Dynamical Systems and Control: A Lyapunov-Based Approach , 2008 .

[33]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[34]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[35]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[36]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[37]  T. Başar,et al.  Dynamic Noncooperative Game Theory, 2nd Edition , 1998 .

[38]  J. Primbs,et al.  Constrained nonlinear optimal control: a converse HJB approach , 1996 .

[39]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[40]  L. L. Lynn,et al.  The method of weighted residuals and variational principles, Bruce A. Finlayson, Academic Press, New York (1972). 412 pages , 1973 .