Multiperson zero‐sum differential games for a class of uncertain nonlinear systems

In this paper, multiperson zero-sum differential games for a class of continuous-time uncertain nonlinear systems are solved using a new iterative adaptive dynamic programming (ADP) algorithm. The idea is to use ADP technique to obtain the optimal control pair iteratively that makes the performance index function reach the optimal solution of the zero-sum differential games without the system model. It proves that the iterative performance index functions are convergent to the optimal solution of the game. Stability properties of the system under the iterative control pairs are also presented. Neural networks are used to build the system model, approximate the performance index function, and compute the optimal control policy, respectively, for facilitating the implementation of the iterative ADP method. Finally, two simulation examples are given to demonstrate the performance of the proposed method. Copyright (c) 2012 John Wiley & Sons, Ltd.

[1]  Derong Liu,et al.  Nonlinear multi-person zero-sum differential games using iterative adaptive dynamic programming , 2011, Proceedings of the 30th Chinese Control Conference.

[2]  Huaguang Zhang,et al.  Model-free multiobjective approximate dynamic programming for discrete-time nonlinear systems with general performance index functions , 2009, Neurocomputing.

[3]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[4]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[5]  Frank L. Lewis,et al.  Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[6]  D. Liu,et al.  Adaptive Dynamic Programming for Finite-Horizon Optimal Control of Discrete-Time Nonlinear Systems With $\varepsilon$-Error Bound , 2011, IEEE Transactions on Neural Networks.

[7]  Qiuye Sun,et al.  Adaptive dynamic programming-based optimal control of unknown nonaffine nonlinear discrete-time systems with proof of convergence , 2012, Neurocomputing.

[8]  Frank L. Lewis,et al.  Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2010, Autom..

[9]  Frank L. Lewis,et al.  Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10]  Howard M. Schwartz,et al.  Q(λ)‐learning adaptive fuzzy logic controllers for pursuit–evasion differential games , 2011 .

[11]  Huaguang Zhang,et al.  An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games , 2011, Autom..

[12]  Yi Zhang,et al.  A self-learning call admission control scheme for CDMA cellular networks , 2005, IEEE Transactions on Neural Networks.

[13]  Hecht-Nielsen Theory of the backpropagation neural network , 1989 .

[14]  Huaguang Zhang,et al.  A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[15]  Derong Liu,et al.  Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach , 2012, Neurocomputing.

[16]  Frank L. Lewis,et al.  Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations , 2011, Autom..

[17]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[18]  Jennie Si,et al.  Online learning control by association and reinforcement. , 2001, IEEE transactions on neural networks.

[19]  Jennie Si,et al.  Helicopter trimming and tracking control using direct neural dynamic programming , 2003, IEEE Trans. Neural Networks.

[20]  Haibo He,et al.  A three-network architecture for on-line learning and optimization based on adaptive dynamic programming , 2012, Neurocomputing.

[21]  F. Lewis,et al.  Discrete-time nonlinear HJB solution using Approximate dynamic programming: Convergence Proof , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[22]  Radhakant Padhi,et al.  A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems , 2006, Neural Networks.

[23]  I. Petersen,et al.  High gain observers applied to problems in the stabilization of uncertain linear systems, disturbance attenuation and N∞ optimization , 1988 .

[24]  Frank L. Lewis,et al.  Policy Iterations on the Hamilton–Jacobi–Isaacs Equation for $H_{\infty}$ State Feedback Control With Input Saturation , 2006, IEEE Transactions on Automatic Control.

[25]  Frank L. Lewis,et al.  Neurodynamic Programming and Zero-Sum Games for Constrained Control Systems , 2008, IEEE Transactions on Neural Networks.

[26]  George G. Lendaris,et al.  Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.