Adaptive Dynamic Programming for Solving Non-Zero-Sum Differential Games

Abstract In this paper, a novel adaptive dynamic programming algorithm based on policy iteration is developed to solve online multi-player non-zero-sum differential game for continuous-time nonlinear systems. This algorithm is mathematically equivalent to the quasi-Newton's iteration in a Banach space. The implementation using neural networks is given, where a critic neural network is used to learn its value function, and an action neural network sharing the same parameters with the corresponding critic neural network is used to learn its optimal control policy for each player. All the critic and action neural networks are updated online in real-time and continuously. A simulation example is presented to demonstrate the effectiveness of the developed scheme.

[1]  S. Jagannathan,et al.  Optimal control of affine nonlinear continuous-time systems using an online Hamilton-Jacobi-Isaacs formulation , 2010, 49th IEEE Conference on Decision and Control (CDC).

[2]  Frank L. Lewis,et al.  Online solution of nonlinear two-player zero-sum games using synchronous policy iteration , 2010, 49th IEEE Conference on Decision and Control (CDC).

[3]  Frank L. Lewis,et al.  Integral Reinforcement Learning for online computation of feedback Nash strategies of nonzero-sum differential games , 2010, 49th IEEE Conference on Decision and Control (CDC).

[4]  Marcus Johnson,et al.  Nonlinear two-player zero-sum game approximate solution using a Policy Iteration algorithm , 2011, IEEE Conference on Decision and Control and European Control Conference.

[5]  Huai-Ning Wu,et al.  Neural Network Based Online Simultaneous Policy Update Algorithm for Solving the HJI Equation in Nonlinear $H_{\infty}$ Control , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Stef Tijs,et al.  Introduction to Game Theory , 2003 .

[7]  Frank L. Lewis,et al.  Adaptive Dynamic Programming algorithm for finding online the equilibrium solution of the two-player zero-sum differential game , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[8]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[9]  T. Başar,et al.  Dynamic Noncooperative Game Theory, 2nd Edition , 1998 .

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Frank L. Lewis,et al.  Online solution of nonlinear two‐player zero‐sum games using synchronous policy iteration , 2012 .

[12]  Huaguang Zhang,et al.  Near-Optimal Control for Nonzero-Sum Differential Games of Continuous-Time Nonlinear Systems Using Single-Network ADP , 2013, IEEE Transactions on Cybernetics.

[13]  Frank L. Lewis,et al.  Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations , 2011, Autom..

[14]  Frank L. Lewis,et al.  Neurodynamic Programming and Zero-Sum Games for Constrained Control Systems , 2008, IEEE Transactions on Neural Networks.