Neural-network-based learning algorithms for cooperative games of discrete-time multi-player systems with control constraints via adaptive dynamic programming

Abstract Adaptive dynamic programming (ADP), an important branch of reinforcement learning, is a powerful tool in solving various optimal control problems. However, the cooperative game issues of discrete-time multi-player systems with control constraints have rarely been investigated in this field. In order to address this issue, a novel policy iteration (PI) algorithm is proposed based on ADP technique, and its associated convergence analysis is also studied in this brief paper. For the proposed PI algorithm, an online neural network (NN) implementation scheme with multiple-network structure is presented. In the online NN-based learning algorithm, critic network, constrained actor networks and unconstrained actor networks are employed to approximate the value function, constrained and unconstrained control policies, respectively, and the NN weight updating laws are designed based on the gradient descent method. Finally, a numerical simulation example is illustrated to show the effectiveness.

[1]  Chaomin Luo,et al.  Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms , 2017, IEEE Transactions on Cybernetics.

[2]  Chaoxu Mu,et al.  Neural-network-based adaptive guaranteed cost control of nonlinear dynamical systems with matched uncertainties , 2017, Neurocomputing.

[3]  Derong Liu,et al.  Decentralized guaranteed cost control of interconnected systems with uncertainties: A learning-based optimal control strategy , 2016, Neurocomputing.

[4]  Dongbin Zhao,et al.  Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics , 2016 .

[5]  Qinglai Wei,et al.  Dual iterative adaptive dynamic programming for a class of discrete-time nonlinear systems with time-delays , 2012, Neural Computing and Applications.

[6]  Huaguang Zhang,et al.  Optimal Tracking Control for a Class of Nonlinear Discrete-Time Systems With Time Delays Based on Heuristic Dynamic Programming , 2011, IEEE Transactions on Neural Networks.

[7]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[8]  Derong Liu,et al.  Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games With Unknown Dynamics , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[9]  Derong Liu,et al.  Data-Based Adaptive Critic Designs for Nonlinear Robust Optimal Control With Uncertain Dynamics , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[10]  Derong Liu,et al.  Reinforcement-Learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints , 2015, IEEE Transactions on Cybernetics.

[11]  Frank L. Lewis,et al.  Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations , 2011, Autom..

[12]  Frank L. Lewis,et al.  Multiple Actor-Critic Structures for Continuous-Time Optimal Control Using Input-Output Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Frank L. Lewis,et al.  Actor–Critic-Based Optimal Tracking for Partially Unknown Nonlinear Discrete-Time Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Huaguang Zhang,et al.  Near-Optimal Control for Nonzero-Sum Differential Games of Continuous-Time Nonlinear Systems Using Single-Network ADP , 2013, IEEE Transactions on Cybernetics.

[15]  Qiuye Sun,et al.  Nearly finite-horizon optimal control for a class of nonaffine time-delay nonlinear systems based on adaptive dynamic programming , 2015, Neurocomputing.

[16]  Huaguang Zhang,et al.  Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.

[17]  Dongbin Zhao,et al.  MEC—A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Frank L. Lewis,et al.  Discrete-Time Deterministic $Q$ -Learning: A Novel Convergence Analysis , 2017, IEEE Transactions on Cybernetics.

[19]  Warren E. Dixon,et al.  Concurrent learning-based approximate feedback-Nash equilibrium solution of N-player nonzero-sum differential games , 2013, IEEE/CAA Journal of Automatica Sinica.

[20]  Derong Liu,et al.  Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach , 2012, Neurocomputing.

[21]  Derong Liu,et al.  Adaptive Dynamic Programming for Optimal Tracking Control of Unknown Nonlinear Systems With Application to Coal Gasification , 2014, IEEE Transactions on Automation Science and Engineering.

[22]  Derong Liu,et al.  Neural-Network-Based Optimal Control for a Class of Unknown Discrete-Time Nonlinear Systems Using Globalized Dual Heuristic Programming , 2012, IEEE Transactions on Automation Science and Engineering.

[23]  Tingwen Huang,et al.  Reinforcement learning solution for HJB equation arising in constrained optimal control problem , 2015, Neural Networks.

[24]  Tingwen Huang,et al.  Off-Policy Reinforcement Learning for $ H_\infty $ Control Design , 2013, IEEE Transactions on Cybernetics.

[25]  Derong Liu,et al.  Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Frank L. Lewis,et al.  Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[27]  Qichao Zhang,et al.  Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics , 2016, IEEE Transactions on Cybernetics.

[28]  Derong Liu,et al.  Value Iteration Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems , 2016, IEEE Transactions on Cybernetics.

[29]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[30]  Tingwen Huang,et al.  Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design , 2014, Autom..

[31]  Derong Liu,et al.  Finite-Approximation-Error-Based Discrete-Time Iterative Adaptive Dynamic Programming , 2014, IEEE Transactions on Cybernetics.

[32]  Huaguang Zhang,et al.  Finite-Horizon $H_{\infty }$ Tracking Control for Unknown Nonlinear Systems With Saturating Actuators , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Qichao Zhang,et al.  Data-driven adaptive dynamic programming for continuous-time fully cooperative games with partially constrained inputs , 2017, Neurocomputing.

[34]  Haibo He,et al.  A three-network architecture for on-line learning and optimization based on adaptive dynamic programming , 2012, Neurocomputing.