Off-policy synchronous iteration IRL method for multi-player zero-sum games with input constraints

Abstract In this paper, a novel synchronous off-policy method is given to solve multi-player zero-sum (ZS) game under the condition that the knowledge of system data are completely unknown, the actuators of controls are constrained and the disturbances are bounded simultaneously. The cost functions are built by nonquadratic functions to reflect the constrained properties of inputs. The integral reinforcement learning (IRL) technology is employed to solve Hamilton–Jacobi–Bellman equation, so that the system dynamics are not necessary anymore. The obtained value function is proved to converge to the optimal game values. And the equivalent of traditional policy iteration (PI) algorithm and the proposed algorithm is given in solving the multi-player ZS game with constrained inputs. Three neural networks in this paper are utilized, the critic neural network (CNN) to approach the cost function, the action neural network (ANN) to approach the control policies and the disturbance neural networks (DNN) to approach the disturbances are utilized. Finally, a simulation example is given to demonstrate the convergence and performance of the proposed algorithm.

[1]  Bilal H. Abed-alguni,et al.  A Comparison Study of Cooperative Q-learning Algorithms for Independent Learners , 2016 .

[2]  Derong Liu,et al.  Multiperson zero‐sum differential games for a class of uncertain nonlinear systems , 2014 .

[3]  Haibo He,et al.  Data-driven heuristic dynamic programming with virtual reality , 2015, Neurocomputing.

[4]  Derong Liu,et al.  Neural-Network-Based Online HJB Solution for Optimal Robust Guaranteed Cost Control of Continuous-Time Uncertain Nonlinear Systems , 2014, IEEE Transactions on Cybernetics.

[5]  Frank L. Lewis,et al.  Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  James E. Steck,et al.  Adaptive Feedback Control by Constrained Approximate Dynamic Programming , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[7]  Bilal H. Abed-alguni,et al.  A multi-agent cooperative reinforcement learning model using a hierarchy of consultants, tutors and workers , 2015, Vietnam Journal of Computer Science.

[8]  Frank L. Lewis,et al.  Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Yanhong Luo,et al.  Data-driven optimal tracking control for a class of affine non-linear continuous-time systems with completely unknown dynamics , 2016 .

[10]  Huaguang Zhang,et al.  Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.

[11]  Frank L. Lewis,et al.  Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations , 2011, Autom..

[12]  Tao Feng,et al.  Distributed Optimal Consensus Control for Nonlinear Multiagent System With Unknown Dynamic , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Qinglai Wei,et al.  Optimal control for discrete‐time systems with actuator saturation , 2017 .

[14]  Huaguang Zhang,et al.  Stability analysis of heuristic dynamic programming algorithm for nonlinear systems , 2015, Neurocomputing.

[15]  Frank L. Lewis,et al.  Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances , 2016, IEEE Transactions on Cybernetics.

[16]  Huaguang Zhang,et al.  General value iteration based single network approach for constrained optimal controller design of partially-unknown continuous-time nonlinear systems , 2018, J. Frankl. Inst..

[17]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[18]  Yuanqing Xia,et al.  Solving Trajectory Optimization Problems in the Presence of Probabilistic Constraints , 2020, IEEE Transactions on Cybernetics.

[19]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[20]  Frank L. Lewis,et al.  Adaptive dynamic programming for online solution of a zero-sum differential game , 2011 .

[21]  Kay Chen Tan,et al.  Global exponential stability of discrete-time neural networks for constrained quadratic optimization , 2004, Neurocomputing.

[22]  Xin Zhang,et al.  Data-Driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method , 2011, IEEE Transactions on Neural Networks.

[23]  Tingwen Huang,et al.  Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design , 2014, Autom..

[24]  A. Tsourdos,et al.  Multi-objective trajectory optimization of Space Manoeuvre Vehicle using adaptive differential evolution and modified game theory , 2017 .

[25]  Huaguang Zhang,et al.  Integral reinforcement learning off-policy method for solving nonlinear multi-player nonzero-sum games with saturated actuator , 2019, Neurocomputing.

[26]  Frank L. Lewis,et al.  Integral Reinforcement Learning for online computation of feedback Nash strategies of nonzero-sum differential games , 2010, CDC 2010.

[27]  Huaguang Zhang,et al.  Integral reinforcement learning based decentralized optimal tracking control of unknown nonlinear large-scale interconnected systems with constrained-input , 2019, Neurocomputing.

[28]  Huaguang Zhang,et al.  Globally optimal distributed cooperative control for general linear multi-agent systems , 2016, Neurocomputing.

[29]  Hongjing Liang,et al.  Adaptive Distributed Observer Approach for Cooperative Containment Control of Nonidentical Networks , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[30]  Huaguang Zhang,et al.  Online optimal control of unknown discrete-time nonlinear systems by using time-based adaptive dynamic programming , 2015, Neurocomputing.

[31]  Qichao Zhang,et al.  Data-driven adaptive dynamic programming for continuous-time fully cooperative games with partially constrained inputs , 2017, Neurocomputing.