Integral Reinforcement Learning for online computation of feedback Nash strategies of nonzero-sum differential games

This paper presents an Approximate/Adaptive Dynamic Programming (ADP) algorithm that finds online the Nash equilibrium for two-player nonzero-sum differential games with linear dynamics and infinite horizon quadratic cost. Each of the game players is using the procedure of Integral Reinforcement Learning (IRL) to calculate online the infinite horizon value function that it associates with every given set of feedback control policies. It will be shown that the online algorithm is mathematically equivalent to an offline iterative method, previously introduced in the literature, that solves the set of coupled algebraic Riccati equations (ARE) underlying the game problem using complete knowledge on the system dynamics. Here we show how the ADP techniques will enhance the capabilities of the offline method allowing an online solution without the requirement of complete knowledge of the system dynamics. The two participants in the continuous-time differential game are competing in real-time and the feedback Nash control strategies will be determined based on online measured data from the system. The algorithm is built on interplay between a learning phase, where each of the players is learning online the value that they associate with a given set of play policies, and a policy update step, performed by each of the payers towards decreasing the value of their cost. The players are learning concurrently. The feasibility of the ADP scheme is demonstrated in simulation.

[1]  Hiroaki Mukaidani Newton's method for solving cross-coupled sign-indefinite algebraic Riccati equations for weakly coupled large-scale systems , 2007, Appl. Math. Comput..

[2]  H. Abou-Kandil,et al.  Necessary conditions for constant solutions of coupled Riccati equations in Nash games , 1993 .

[3]  Hisham Abou-Kandil,et al.  On global existence of solutions to coupled matrix Riccati equations in closed-loop Nash games , 1996, IEEE Trans. Autom. Control..

[4]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[5]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[6]  Hiroaki Mukaidani Numerical computation of sign-indefinite linear quadratic differential games for weakly coupled large-scale systems , 2007, Int. J. Control.

[7]  Y. Ho,et al.  Nonzero-sum differential games , 1969 .

[8]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[9]  Joe Brewer,et al.  Kronecker products and matrix calculus in system theory , 1978 .

[10]  Lyle Noakes,et al.  Continuous-Time Adaptive Critics , 2007, IEEE Transactions on Neural Networks.

[11]  M. Jungers,et al.  Solving Coupled Algebraic Riccati Equations from Closed-Loop Nash Strategy in Discrete Time, by Lack of Trust Approach , 2008 .

[12]  Jacob Engwerda,et al.  LQ Dynamic Optimization and Differential Games , 2005 .

[13]  Huaguang Zhang,et al.  A New Approach to Solve a Class of Continuous-Time Nonlinear Quadratic Zero-Sum Game Using ADP , 2008, 2008 IEEE International Conference on Networking, Sensing and Control.

[14]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[15]  T.-Y. Li,et al.  Lyapunov Iterations for Solving Coupled Algebraic Riccati Equations of Nash Differential Games and Algebraic Riccati Equations of Zero-Sum Games , 1995 .

[16]  C. Watkins Learning from delayed rewards , 1989 .

[17]  Randal W. Beard,et al.  Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation , 1997, Autom..

[18]  Yacine Chitour,et al.  A new algorithm for solving coupled algebraic Riccati equations , 2005, International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC'06).

[19]  D. Kleinman On an iterative technique for Riccati equation computations , 1968 .