$ {H}_{ {\infty }}$ Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning

This paper deals with the design of an H∞ tracking controller for nonlinear continuous-time systems with completely unknown dynamics. A general bounded L2-gain tracking problem with a discounted performance function is introduced for the H∞ tracking. A tracking Hamilton-Jacobi-Isaac (HJI) equation is then developed that gives a Nash equilibrium solution to the associated min-max optimization problem. A rigorous analysis of bounded L2-gain and stability of the control solution obtained by solving the tracking HJI equation is provided. An upper-bound is found for the discount factor to assure local asymptotic stability of the tracking error dynamics. An off-policy reinforcement learning algorithm is used to learn the solution to the tracking HJI equation online without requiring any knowledge of the system dynamics. Convergence of the proposed algorithm to the solution to the tracking HJI equation is shown. Simulation examples are provided to verify the effectiveness of the proposed method.

[1]  T. Başar,et al.  IEEE Decision and Control Conference Sydney , Australia , Dec 2000 H ∞-Optimal Tracking Control Techniques for Nonlinear Underactuated Systems , 2006 .

[2]  Bin Jiang,et al.  Online Adaptive Policy Learning Algorithm for $H_{\infty }$ State Feedback Control of Unknown Affine Nonlinear Discrete-Time Systems , 2014, IEEE Transactions on Cybernetics.

[3]  T. Başar Feedback and Optimal Sensitivity: Model Reference Transformations, Multiplicative Seminorms, and Approximate Inverses , 2001 .

[4]  Huai-Ning Wu,et al.  Neural Network Based Online Simultaneous Policy Update Algorithm for Solving the HJI Equation in Nonlinear $H_{\infty}$ Control , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[5]  M.D.S. Aliyu Nonlinear H-Infinity Control, Hamiltonian Systems and Hamilton-Jacobi Equations , 2011 .

[6]  Frank L. Lewis,et al.  Online Gaming: Real Time Solution of Nonlinear Two-Player Zero-Sum Games Using Synchronous Policy Iteration , 2011 .

[7]  Frank L. Lewis,et al.  A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems , 2013, Autom..

[8]  Tamer Başar,et al.  H1-Optimal Control and Related Minimax Design Problems , 1995 .

[9]  Van,et al.  L2-Gain Analysis of Nonlinear Systems and Nonlinear State Feedback H∞ Control , 2004 .

[10]  T. Basar,et al.  H/sup /spl infin//-optimal tracking control techniques for nonlinear underactuated systems , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).

[11]  Zhong-Ping Jiang,et al.  Adaptive dynamic programming and optimal control of nonlinear nonaffine systems , 2014, Autom..

[12]  Pushkin Kachroo,et al.  Tracking Control for a Class of Nonlinear Systems , 1999 .

[13]  Frank L. Lewis,et al.  Optimized Assistive Human–Robot Interaction Using Reinforcement Learning , 2016, IEEE Transactions on Cybernetics.

[14]  Yu Jiang,et al.  Robust Adaptive Dynamic Programming and Feedback Stabilization of Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Frank L. Lewis,et al.  Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics , 2014, Autom..

[16]  Arthur J. Krener,et al.  H∞ tracking control for a class of nonlinear systems , 1999, IEEE Trans. Autom. Control..

[17]  Warren E. Dixon,et al.  Approximate optimal trajectory tracking for continuous-time nonlinear systems , 2013, Autom..

[18]  Derong Liu,et al.  Error Bounds of Adaptive Dynamic Programming Algorithms for Solving Undiscounted Optimal Control Problems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Huai-Ning Wu,et al.  Simultaneous policy update algorithms for learning the solution of linear continuous-time H∞ state feedback control , 2013, Inf. Sci..

[20]  B. Paden,et al.  Nonlinear inversion-based output tracking , 1996, IEEE Trans. Autom. Control..

[21]  Frank L. Lewis,et al.  Actor–Critic-Based Optimal Tracking for Partially Unknown Nonlinear Discrete-Time Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Huaguang Zhang,et al.  Adaptive Dynamic Programming for a Class of Complex-Valued Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[23]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[24]  Frank L. Lewis,et al.  Robot Manipulator Control: Theory and Practice , 2003 .

[25]  Huaguang Zhang,et al.  Optimal Tracking Control for a Class of Nonlinear Discrete-Time Systems With Time Delays Based on Heuristic Dynamic Programming , 2011, IEEE Transactions on Neural Networks.

[26]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[27]  Frank L. Lewis,et al.  Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Derong Liu,et al.  Data-Driven Neuro-Optimal Temperature Control of Water–Gas Shift Reaction Using Stable Iterative Adaptive Dynamic Programming , 2014, IEEE Transactions on Industrial Electronics.

[29]  Frank L. Lewis,et al.  Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning , 2014, Autom..

[30]  Frank L. Lewis,et al.  Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning , 2014, IEEE Transactions on Automatic Control.

[31]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[32]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[33]  Frank L. Lewis,et al.  Adaptive dynamic programming for online solution of a zero-sum differential game , 2011 .

[34]  A. Schaft L/sub 2/-gain analysis of nonlinear systems and nonlinear state-feedback H/sub infinity / control , 1992 .

[35]  Tingwen Huang,et al.  Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design , 2014, Autom..

[36]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[37]  Frank L. Lewis,et al.  Online solution of nonlinear two-player zero-sum games using synchronous policy iteration , 2010, 49th IEEE Conference on Decision and Control (CDC).

[38]  Derong Liu,et al.  A Novel Dual Iterative $Q$-Learning Method for Optimal Battery Management in Smart Residential Environments , 2015, IEEE Transactions on Industrial Electronics.

[39]  Frank L. Lewis,et al.  Online solution of nonlinear two‐player zero‐sum games using synchronous policy iteration , 2012 .

[40]  Antonio Visioli,et al.  State feedback control , 2013 .

[41]  W. Dixon Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles , 2014 .

[42]  Derong Liu,et al.  Integral Reinforcement Learning for Linear Continuous-Time Zero-Sum Games With Completely Unknown Dynamics , 2014, IEEE Transactions on Automation Science and Engineering.

[43]  P. Khargonekar,et al.  State-space solutions to standard H2 and H∞ control problems , 1988, 1988 American Control Conference.

[44]  Huaguang Zhang,et al.  Online optimal tracking control of continuous-time linear systems with unknown dynamics by using adaptive dynamic programming , 2014, Int. J. Control.

[45]  Huai‐Ning Wu,et al.  Computationally efficient simultaneous policy update algorithm for nonlinear H∞ state feedback control with Galerkin's method , 2013 .

[46]  Wei Lin,et al.  Global L 2-gain design for a class of nonlinear systems 1 1 Supported in part by NSF under grant ECS-9412340 and by MURST. , 1998 .

[47]  Frank L. Lewis,et al.  Optimal Control: Lewis/Optimal Control 3e , 2012 .

[48]  Tingwen Huang,et al.  Off-Policy Reinforcement Learning for $ H_\infty $ Control Design , 2013, IEEE Transactions on Cybernetics.

[49]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[50]  Han-Xiong Li,et al.  Adaptive Optimal Control of Highly Dissipative Nonlinear Spatially Distributed Processes With Neuro-Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[51]  Derong Liu,et al.  Multibattery Optimal Coordination Control for Home Energy Management Systems via Distributed Iterative Adaptive Dynamic Programming , 2015, IEEE Transactions on Industrial Electronics.

[52]  Frank L. Lewis,et al.  2009 Special Issue: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems , 2009 .

[53]  F. Lewis,et al.  Online solution of nonquadratic two‐player zero‐sum games arising in the H ∞  control of constrained input systems , 2014 .

[54]  Derong Liu,et al.  Adaptive Dynamic Programming for Optimal Tracking Control of Unknown Nonlinear Systems With Application to Coal Gasification , 2014, IEEE Transactions on Automation Science and Engineering.

[55]  Derong Liu,et al.  Neural Network H ∞ Tracking Control of Nonlinear Systems Using GHJI Method , 2013, ISNN.

[56]  Frank L. Lewis,et al.  Neurodynamic Programming and Zero-Sum Games for Constrained Control Systems , 2008, IEEE Transactions on Neural Networks.

[57]  Huaguang Zhang,et al.  An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games , 2011, Autom..

[58]  P. Khargonekar,et al.  State-space solutions to standard H/sub 2/ and H/sub infinity / control problems , 1989 .