Adaptive optimal tracking controls of unknown multi-input systems based on nonzero-sum game theory

Abstract This paper focuses on the optimal tracking control problem (OTCP) for the unknown multi-input system by using a reinforcement learning (RL) scheme and nonzero-sum (NZS) game theory. First, a generic method for the OTCP of multi-input systems is formulated with steady-state controls and optimal feedback controls based on the NZS game theory. Then a three-layer neural network (NN) identifier is introduced to approximate the unknown system, and the input dynamics are obtained by using the derivative of the identifier. To transform the OTCP into a regulation optimal problem, an augmentation of the multi-input system is constructed by using the tracking error and the commanded trajectory. Moreover, we use an NN-based RL method to online learn the optimal value functions of all the inputs, which are then directly used to calculate the optimal tracking controls. All the NN weights are tuned synchronously online with a newly introduced adaptation based on the estimation error. The convergence of the NN weights and the stability of the closed-loop system are analyzed. Finally, a two-motor driven servo system and another nonlinear system are presented to illustrate the feasibility of the algorithm for both linear and nonlinear multi-input systems.

[1]  Haibo He,et al.  Data-Driven Tracking Control With Adaptive Dynamic Programming for a Class of Continuous-Time Nonlinear Systems , 2017, IEEE Transactions on Cybernetics.

[2]  J. Nash,et al.  NON-COOPERATIVE GAMES , 1951, Classics in Game Theory.

[3]  Frank L. Lewis,et al.  Online solution of nonlinear two‐player zero‐sum games using synchronous policy iteration , 2012 .

[4]  Frank L. Lewis,et al.  Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Min Wu,et al.  Stability Analysis for Neural Networks With Time-Varying Interval Delay , 2007, IEEE Transactions on Neural Networks.

[6]  Qichao Zhang,et al.  Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics , 2016, IEEE Transactions on Cybernetics.

[7]  Bin Jiang,et al.  Online Adaptive Policy Learning Algorithm for $H_{\infty }$ State Feedback Control of Unknown Affine Nonlinear Discrete-Time Systems , 2014, IEEE Transactions on Cybernetics.

[8]  Jing Na,et al.  Extended-State-Observer-Based Funnel Control for Nonlinear Servomechanisms With Prescribed Tracking Performance , 2017, IEEE Transactions on Automation Science and Engineering.

[9]  Frank L. Lewis,et al.  Adaptive dynamic programming for online solution of a zero-sum differential game , 2011 .

[10]  Derong Liu,et al.  Finite-Approximation-Error-Based Optimal Control Approach for Discrete-Time Nonlinear Systems , 2013, IEEE Transactions on Cybernetics.

[11]  Huaguang Zhang,et al.  Data-Driven Optimal Consensus Control for Discrete-Time Multi-Agent Systems With Unknown Dynamics Using Reinforcement Learning Method , 2017, IEEE Transactions on Industrial Electronics.

[12]  F. Lewis,et al.  Online solution of nonquadratic two‐player zero‐sum games arising in the H ∞  control of constrained input systems , 2014 .

[13]  Jing Na,et al.  Online optimal solutions for multi-player nonzero-sum game with completely unknown dynamics , 2017, Neurocomputing.

[14]  Qinglai Wei,et al.  Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming , 2012, Autom..

[15]  Derong Liu,et al.  Adaptive Dynamic Programming for Discrete-Time Zero-Sum Games , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Kyriakos G. Vamvoudakis,et al.  Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems , 2015, Autom..

[17]  Jing Na,et al.  RISE-Based Asymptotic Prescribed Performance Tracking Control of Nonlinear Servo Mechanisms , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[18]  Jing Na,et al.  Identification and Control for Singularly Perturbed Systems Using Multitime-Scale Neural Networks , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Frank L. Lewis,et al.  Actor–Critic-Based Optimal Tracking for Partially Unknown Nonlinear Discrete-Time Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Haibo He,et al.  Event-Driven Nonlinear Discounted Optimal Regulation Involving a Power System Application , 2017, IEEE Transactions on Industrial Electronics.

[21]  Jing Na,et al.  Neural-Network-Based Adaptive Funnel Control for Servo Mechanisms With Unknown Dead-Zone , 2020, IEEE Transactions on Cybernetics.

[22]  Guido Herrmann,et al.  Robust adaptive finite‐time parameter estimation and control for robotic systems , 2015 .

[23]  Frank L. Lewis,et al.  H∞ control of linear discrete-time systems: Off-policy reinforcement learning , 2017, Autom..

[24]  Qing-Guo Wang,et al.  Delay-Dependent State Estimation for Delayed Neural Networks , 2006, IEEE Transactions on Neural Networks.

[25]  Frank L. Lewis,et al.  Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[26]  Xuemei Ren,et al.  Approximate Nash Solutions for Multiplayer Mixed-Zero-Sum Game With Reinforcement Learning , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[27]  Xuemei Ren,et al.  Modified multi-innovation stochastic gradient algorithm for Wiener-Hammerstein systems with backlash , 2018, J. Frankl. Inst..

[28]  Kun Zhang,et al.  Data-driven adaptive dynamic programming schemes for non-zero-sum games of unknown discrete-time nonlinear systems , 2018, Neurocomputing.

[29]  Zhong-Ping Jiang,et al.  Decentralized Adaptive Optimal Control of Large-Scale Systems With Application to Power Systems , 2015, IEEE Transactions on Industrial Electronics.

[30]  Xiong Yang,et al.  Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints , 2014, Int. J. Control.

[31]  Tianyou Chai,et al.  Identification and Trajectory Tracking Control of Nonlinear Singularly Perturbed Systems , 2017, IEEE Transactions on Industrial Electronics.

[32]  Derong Liu,et al.  Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games With Unknown Dynamics , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[33]  Jin-Hua She,et al.  New Results on $H_\infty$ Tracking Control Based on the T–S Fuzzy Model for Sampled-Data Networked Control System , 2015, IEEE Transactions on Fuzzy Systems.

[34]  Frank L. Lewis,et al.  $ {H}_{ {\infty }}$ Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Jing Na,et al.  Online H∞ control for completely unknown nonlinear systems via an identifier–critic-based ADP structure , 2019, Int. J. Control.

[36]  Huaguang Zhang,et al.  An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games , 2011, Autom..

[37]  Haoyong Yu,et al.  Identification and Control of Nonlinear Systems Using Neural Networks: A Singularity-Free Approach , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Yong He,et al.  Global exponential stability of neural networks with time-varying delay based on free-matrix-based integral inequality , 2016, Neural Networks.

[39]  Qichao Zhang,et al.  Data-driven adaptive dynamic programming for continuous-time fully cooperative games with partially constrained inputs , 2017, Neurocomputing.

[40]  Yu Guo,et al.  Online adaptive optimal control for continuous-time nonlinear systems with completely unknown dynamics , 2016, Int. J. Control.

[41]  Xuemei Ren,et al.  Identification of nonlinear Wiener-Hammerstein systems by a novel adaptive algorithm based on cost function framework. , 2018, ISA transactions.

[42]  Derong Liu,et al.  Learning and Guaranteed Cost Control With Event-Based Adaptive Critic Implementation , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[43]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.