Model-Free Temporal Difference Learning for Non-Zero-Sum Games

In this paper, we consider the two-player nonzero-sum games problem for continuous-time linear dynamic systems. It is shown that the non-zero-sum games problem results in solving the coupled algebraic Riccati equations, which are nonlinear algebraic matrix equations. Compared with the algebraic Riccati equation of the linear dynamic systems with only one player, the coupled algebraic Riccati equations of nonzero-sum games with multi-player are more difficult to be solved directly. First, the policy iteration algorithm is introduced to find the Nash equilibrium of the non-zero-sum games, which is the sufficient and necessary condition to solve the coupled algebraic Riccati equations. However, the policy iteration algorithm is offline and requires complete knowledge of the system dynamics. To overcome the above issues, a novel online iterative algorithm, named integral temporal difference learning algorithm, is developed. Moreover, an equivalent compact form of the integral temporal difference learning algorithm is also presented. It is shown that the integral temporal difference learning algorithm can be implemented in an online fashion and requires only partial knowledge of the system dynamics. In addition, in each iteration step, the closed-loop stability using the integral temporal difference learning algorithm is analyzed. Finally, the simulation study shows the effectiveness of the presented algorithm.

[1]  Yixin Yin,et al.  Optimal Containment Control of Unknown Heterogeneous Systems With Active Leaders , 2019, IEEE Transactions on Control Systems Technology.

[2]  Tingwen Huang,et al.  Reinforcement learning solution for HJB equation arising in constrained optimal control problem , 2015, Neural Networks.

[3]  Qichao Zhang,et al.  Data-Based Reinforcement Learning for Nonzero-Sum Games With Unknown Drift Dynamics , 2019, IEEE Transactions on Cybernetics.

[4]  Yixin Yin,et al.  Hamiltonian-Driven Adaptive Dynamic Programming for Continuous Nonlinear Dynamical Systems , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[6]  Xiaohong Cui,et al.  Data-based approximate optimal control for nonzero-sum games of multi-player systems using adaptive dynamic programming , 2018, Neurocomputing.

[7]  Derong Liu,et al.  Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games With Unknown Dynamics , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[8]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[9]  Jin Ming Yang,et al.  Load frequency control of area power system with multi-source power generation units based on differential games tracking control , 2013, 2013 IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC).

[10]  Vincent W. S. Wong,et al.  Autonomous Demand-Side Management Based on Game-Theoretic Energy Consumption Scheduling for the Future Smart Grid , 2010, IEEE Transactions on Smart Grid.

[11]  Hua Yi,et al.  Differential game analysis of manufacturer enterprise supplier relationship under dynamic market environment , 2010, 2010 2nd IEEE International Conference on Information Management and Engineering.

[12]  Y. Ho,et al.  Nonzero-sum differential games , 1969 .

[13]  Ovanes Petrosian,et al.  Differential game of oil market with moving informational horizon and non-transferable utility , 2017, 2017 Constructive Nonsmooth Analysis and Related Topics (dedicated to the memory of V.F. Demyanov) (CNSA).

[14]  Frank L. Lewis,et al.  Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems , 2014, Autom..

[15]  Tae Yoon Chun,et al.  Integral temporal difference learning for continuous-time linear quadratic regulations , 2017 .

[16]  Yixin Yin,et al.  Data-Driven Robust Control of Discrete-Time Uncertain Linear Systems via Off-Policy Reinforcement Learning , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[17]  Frank L. Lewis,et al.  $ {H}_{ {\infty }}$ Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Frank L. Lewis,et al.  Integral Reinforcement Learning for online computation of feedback Nash strategies of nonzero-sum differential games , 2010, 49th IEEE Conference on Decision and Control (CDC).

[19]  Frank L. Lewis,et al.  Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Qichao Zhang,et al.  Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics , 2016, IEEE Transactions on Cybernetics.

[21]  F. Lewis,et al.  Model-free Q-learning designs for discrete-time zero-sum games with application to H-infinity control , 2007, 2007 European Control Conference (ECC).

[22]  Lihua Xie,et al.  Output feedback H∞ control of systems with parameter uncertainty , 1996 .

[23]  Kyriakos G. Vamvoudakis,et al.  Dynamic intermittent Q ‐learning–based model‐free suboptimal co‐design of ‐stabilization , 2019, International Journal of Robust and Nonlinear Control.

[24]  Xin Zhang,et al.  Data-Driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method , 2011, IEEE Transactions on Neural Networks.

[25]  Zhu Han,et al.  Coalitional game theory for communication networks , 2009, IEEE Signal Processing Magazine.

[26]  Yixin Yin,et al.  Leader–Follower Output Synchronization of Linear Heterogeneous Systems With Active Leader Using Reinforcement Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.