论文信息 - Non-zero sum games: Online learning solution of coupled Hamilton-Jacobi and coupled Riccati equations

Non-zero sum games: Online learning solution of coupled Hamilton-Jacobi and coupled Riccati equations

In this paper we present an online adaptive control algorithm based on policy iteration reinforcement learning techniques to solve the continuous-time (CT) multi player non zero sum (NZS) game with infinite horizon for linear and nonlinear systems. The adaptive algorithm learns online the solution of coupled Riccati equations and coupled Hamilton-Jacobi equations for linear and nonlinear systems respectively. This adaptive control method finds in real-time approximations of the optimal value and the NZS Nash-equilibrium, while also guaranteeing closed-loop stability. The optimal-adaptive algorithm is implemented as a separate actor/critic parametric network approximator structure for every player, and involves simultaneous continuous-time adaptation of the actor/critic networks. A persistence of excitation condition is shown to guarantee convergence of every critic to the actual optimal value function for that player. A detailed mathematical analysis is done for 2-player NZS games. Novel tuning algorithms are given for the actor/critic networks. The convergence to the ash equilibrium is proven and stability of the system is also guaranteed. Simulation examples show the effectiveness of the new algorithm.

Frank L. Lewis | Kyriakos G. Vamvoudakis | F. Lewis | K. Vamvoudakis

[1] G. Burton. Sobolev Spaces , 2013 .

[2] Draguna Vrabie,et al. Adaptive optimal controllers based on Generalized Policy Iteration in a continuous-time framework , 2009, 2009 17th Mediterranean Conference on Control and Automation.

[3] W. Ames. The Method of Weighted Residuals and Variational Principles. By B. A. Finlayson. Academic Press, 1972. 412 pp. $22.50. , 1973, Journal of Fluid Mechanics.

[4] Kurt Hornik,et al. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[5] B. Anderson,et al. A Nash game approach to mixed H/sub 2//H/sub /spl infin// control , 1994 .

[6] Frank L. Lewis,et al. Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[7] Frank L. Lewis,et al. Online policy iteration based algorithms to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[8] Frank L. Lewis,et al. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2010, Autom..

[9] D. Kleinman. On an iterative technique for Riccati equation computations , 1968 .

[10] Frank L. Lewis,et al. Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[11] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[12] Gang Tao,et al. Adaptive Control Design and Analysis , 2003 .

[13] J. Primbs,et al. Constrained nonlinear optimal control: a converse HJB approach , 1996 .

[14] Frank L. Lewis,et al. Policy Iterations on the Hamilton–Jacobi–Isaacs Equation for $H_{\infty}$ State Feedback Control With Input Saturation , 2006, IEEE Transactions on Automatic Control.

[15] Hisham Abou-Kandil,et al. On global existence of solutions to coupled matrix Riccati equations in closed-loop Nash games , 1996, IEEE Trans. Autom. Control..

[16] H. Abou-Kandil,et al. Matrix Riccati Equations in Control and Systems Theory , 2003, IEEE Transactions on Automatic Control.

[17] B. Anderson,et al. A Nash game approach to mixed H2/H∞ control , 1994, IEEE Transactions on Automatic Control.

[18] Frank L. Lewis,et al. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[19] Gang Tao. Adaptive Control Design and Analysis (Adaptive and Learning Systems for Signal Processing, Communications and Control Series) , 2003 .

[20] Petros A. Ioannou,et al. Adaptive control tutorial , 2006, Advances in design and control.

[21] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.