论文信息 - Iterative ADP learning algorithms for discrete-time multi-player games

Iterative ADP learning algorithms for discrete-time multi-player games

Adaptive dynamic programming (ADP) is an important branch of reinforcement learning to solve various optimal control issues. Most practical nonlinear systems are controlled by more than one controller. Each controller is a player, and to make a tradeoff between cooperation and conflict of these players can be viewed as a game. Multi-player games are divided into two main categories: zero-sum game and non-zero-sum game. To obtain the optimal control policy for each player, one needs to solve Hamilton–Jacobi–Isaacs equations for zero-sum games and a set of coupled Hamilton–Jacobi equations for non-zero-sum games. Unfortunately, these equations are generally difficult or even impossible to be solved analytically. To overcome this bottleneck, two ADP methods, including a modified gradient-descent-based online algorithm and a novel iterative offline learning approach, are proposed in this paper. Furthermore, to implement the proposed methods, we employ single-network structure, which obviously reduces computation burden compared with traditional multiple-network architecture. Simulation results demonstrate the effectiveness of our schemes.

Huaguang Zhang | He Jiang | Huaguang Zhang | He Jiang

[1] Frank L. Lewis,et al. Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2] Chaomin Luo,et al. Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms , 2017, IEEE Transactions on Cybernetics.

[3] Derong Liu,et al. Finite-Approximation-Error-Based Discrete-Time Iterative Adaptive Dynamic Programming , 2014, IEEE Transactions on Cybernetics.

[4] Frank L. Lewis,et al. Adaptive Critic Designs for Discrete-Time Zero-Sum Games With Application to $H_{\infty}$ Control , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5] Xiaohong Cui,et al. H∞ control with constrained input for completely unknown nonlinear systems using data-driven reinforcement learning method , 2017, Neurocomputing.

[6] Derong Liu,et al. Value Iteration Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems , 2016, IEEE Transactions on Cybernetics.

[7] Derong Liu,et al. Guaranteed cost neural tracking control for a class of uncertain nonlinear systems using adaptive dynamic programming , 2016, Neurocomputing.

[8] Frank L. Lewis,et al. Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Convergence Analysis , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[9] Tingwen Huang,et al. Model-Free Optimal Tracking Control via Critic-Only Q-Learning , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[10] Haibo He,et al. Event-Triggered Optimal Control for Partially Unknown Constrained-Input Systems via Adaptive Dynamic Programming , 2017, IEEE Transactions on Industrial Electronics.

[11] Derong Liu,et al. Data-Based Adaptive Critic Designs for Nonlinear Robust Optimal Control With Uncertain Dynamics , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[12] Huaguang Zhang,et al. Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[13] Dongbin Zhao,et al. Model-Free Optimal Control for Affine Nonlinear Systems With Convergence Analysis , 2015, IEEE Transactions on Automation Science and Engineering.

[14] Derong Liu,et al. Infinite Horizon Self-Learning Optimal Control of Nonaffine Discrete-Time Nonlinear Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[15] Derong Liu,et al. Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[16] Marcus Johnson,et al. Approximate $N$ -Player Nonzero-Sum Game Solution for an Uncertain Continuous Nonlinear System , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[17] Frank L. Lewis,et al. Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[18] Qichao Zhang,et al. Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics , 2016, IEEE Transactions on Cybernetics.

[19] Feng Liu,et al. A boundedness result for the direct heuristic dynamic programming , 2012, Neural Networks.

[20] Derong Liu,et al. Reinforcement-Learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints , 2015, IEEE Transactions on Cybernetics.

[21] Frank L. Lewis,et al. Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations , 2011, Autom..

[22] Bin Jiang,et al. Online Adaptive Policy Learning Algorithm for $H_{\infty }$ State Feedback Control of Unknown Affine Nonlinear Discrete-Time Systems , 2014, IEEE Transactions on Cybernetics.

[23] Warren E. Dixon,et al. Concurrent learning-based approximate feedback-Nash equilibrium solution of N-player nonzero-sum differential games , 2013, IEEE/CAA Journal of Automatica Sinica.

[24] Tingwen Huang,et al. Off-Policy Reinforcement Learning for $ H_\infty $ Control Design , 2013, IEEE Transactions on Cybernetics.

[25] Dongbin Zhao,et al. MEC—A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[26] Derong Liu,et al. Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming , 2014, Inf. Sci..

[27] Xiong Yang,et al. Online approximate solution of HJI equation for unknown constrained-input nonlinear continuous-time systems , 2016, Inf. Sci..

[28] Frank L. Lewis,et al. Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[29] Huaguang Zhang,et al. Finite-Horizon $H_{\infty }$ Tracking Control for Unknown Nonlinear Systems With Saturating Actuators , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[30] Mariesa L. Crow,et al. Zero-sum two-player game theoretic formulation of affine nonlinear discrete-time systems using neural networks , 2010, IJCNN.

[31] Dongbin Zhao,et al. Comprehensive comparison of online ADP algorithms for continuous-time optimal control , 2017, Artificial Intelligence Review.

[32] Derong Liu,et al. Neural-Network-Based Optimal Control for a Class of Unknown Discrete-Time Nonlinear Systems Using Globalized Dual Heuristic Programming , 2012, IEEE Transactions on Automation Science and Engineering.

[33] Hamid Reza Karimi,et al. Improved Stability and Stabilization Results for Stochastic Synchronization of Continuous-Time Semi-Markovian Jump Neural Networks With Time-Varying Delay , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[34] George G. Lendaris,et al. Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[35] Dongbin Zhao,et al. Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics , 2016 .

[36] F. Lewis,et al. Discrete-time nonlinear HJB solution using Approximate dynamic programming: Convergence Proof , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[37] Xiaohong Cui,et al. Data-based approximate optimal control for nonzero-sum games of multi-player systems using adaptive dynamic programming , 2018, Neurocomputing.

[38] F. Lewis,et al. Model-free Q-learning designs for discrete-time zero-sum games with application to H-infinity control , 2007, 2007 European Control Conference (ECC).

[39] Derong Liu,et al. Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games With Unknown Dynamics , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[40] Frank L. Lewis,et al. Multiple Actor-Critic Structures for Continuous-Time Optimal Control Using Input-Output Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[41] Derong Liu,et al. Adaptive Dynamic Programming for Discrete-Time Zero-Sum Games , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[42] Tingwen Huang,et al. Reinforcement learning solution for HJB equation arising in constrained optimal control problem , 2015, Neural Networks.

[43] Girish Chowdhary,et al. Off-policy reinforcement learning with Gaussian processes , 2014, IEEE/CAA Journal of Automatica Sinica.

[44] Witold Pedrycz,et al. Online Feature Transformation Learning for Cross-Domain Object Category Recognition , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[45] Tingwen Huang,et al. Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design , 2014, Autom..

[46] Robert Kozma,et al. Complete stability analysis of a heuristic approximate dynamic programming control design , 2015, Autom..

[47] Frank L. Lewis,et al. Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[48] Huaguang Zhang,et al. Near-Optimal Control for Nonzero-Sum Differential Games of Continuous-Time Nonlinear Systems Using Single-Network ADP , 2013, IEEE Transactions on Cybernetics.

[49] Derong Liu,et al. On Mixed Data and Event Driven Design for Adaptive-Critic-Based Nonlinear $H_{\infty}$ Control , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[50] Derong Liu,et al. An Approximate Optimal Control Approach for Robust Stabilization of a Class of Discrete-Time Nonlinear Systems With Uncertainties , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[51] Derong Liu,et al. Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm , 2013, Neurocomputing.

[52] Haibo He,et al. Adaptive Critic Nonlinear Robust Control: A Survey , 2017, IEEE Transactions on Cybernetics.

[53] Haibo He,et al. Intelligent Critic Control With Disturbance Attenuation for Affine Dynamics Including an Application to a Microgrid System , 2017, IEEE Transactions on Industrial Electronics.