Data-Driven Nonzero-Sum Game for Discrete-Time Systems Using Off-Policy Reinforcement Learning

In this paper, we develop a data-driven algorithm to learn the Nash equilibrium solution for a two-player non-zero-sum (NZS) game with completely unknown linear discrete-time dynamics based on off-policy reinforcement learning (RL). This algorithm solves the coupled algebraic Riccati equations (CARE) forward in time in a model-free manner by using the online measured data. We first derive the CARE for solving the two-player NZS game. Then, model-free off-policy RL is developed to obviate the requirement of complete knowledge of system dynamics. Besides, on- and off-policy RL algorithms are compared in terms of the robustness against the probing noise. Finally, a simulation example is presented to show the efficacy of the presented approach.

[1]  Frank L. Lewis,et al.  Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , 2012 .

[2]  Qichao Zhang,et al.  Data-Based Reinforcement Learning for Nonzero-Sum Games With Unknown Drift Dynamics , 2019, IEEE Transactions on Cybernetics.

[3]  Yixin Yin,et al.  Hamiltonian-Driven Adaptive Dynamic Programming for Continuous Nonlinear Dynamical Systems , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Kun Zhang,et al.  Iterative adaptive dynamic programming methods with neural network implementation for multi-player zero-sum games , 2018, Neurocomputing.

[5]  Yixin Yin,et al.  Dynamic Intermittent Feedback Design for $H_{\infty}$ Containment Control on a Directed Graph , 2020, IEEE Transactions on Cybernetics.

[6]  Frank L. Lewis,et al.  Optimal and Autonomous Control Using Reinforcement Learning: A Survey , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Frank L. Lewis,et al.  Multi-agent differential graphical games , 2011, Proceedings of the 30th Chinese Control Conference.

[8]  Ning Sun,et al.  Antiswing Cargo Transportation of Underactuated Tower Crane Systems by a Nonlinear Controller Embedded With an Integral Term , 2019, IEEE Transactions on Automation Science and Engineering.

[9]  Huaguang Zhang,et al.  An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games , 2011, Autom..

[10]  Frank L. Lewis,et al.  $ {H}_{ {\infty }}$ Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Yixin Yin,et al.  Containment Control of Heterogeneous Systems With Non-Autonomous Leaders: A Distributed Optimal Model Reference Approach , 2018, IEEE Access.

[12]  Chaomin Luo,et al.  Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms , 2017, IEEE Transactions on Cybernetics.

[13]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[14]  Frank L. Lewis,et al.  Online solution of nonlinear two-player zero-sum games using synchronous policy iteration , 2010, 49th IEEE Conference on Decision and Control (CDC).

[15]  Frank L. Lewis,et al.  Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning , 2014, IEEE Transactions on Automatic Control.

[16]  Qichao Zhang,et al.  Event-Triggered H ∞ Control for Continuous-Time Nonlinear System , 2015, ISNN.

[17]  Yixin Yin,et al.  Optimal Containment Control of Unknown Heterogeneous Systems With Active Leaders , 2019, IEEE Transactions on Control Systems Technology.

[18]  Ming He,et al.  Admissible output consensualization control for singular multi-agent systems with time delays , 2016, J. Frankl. Inst..

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  Frank L. Lewis,et al.  Multi-agent discrete-time graphical games and reinforcement learning solutions , 2014, Autom..

[21]  Frank L. Lewis,et al.  Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations , 2011, Autom..

[22]  Haibo He,et al.  Novel iterative neural dynamic programming for data-based approximate optimal control design , 2017, Autom..

[23]  Huaguang Zhang,et al.  Iterative ADP learning algorithms for discrete-time multi-player games , 2018, Artificial Intelligence Review.

[24]  Frank L. Lewis,et al.  Discrete-Time Deterministic $Q$ -Learning: A Novel Convergence Analysis , 2017, IEEE Transactions on Cybernetics.

[25]  M. Bacharach Two-person Cooperative Games , 1976 .

[26]  Yanhong Luo,et al.  Data-driven approximate optimal tracking control schemes for unknown non-affine non-linear multi-player systems via adaptive dynamic programming , 2017 .

[27]  Randal W. Bea Successive Galerkin approximation algorithms for nonlinear optimal and robust control , 1998 .

[28]  Kyriakos G. Vamvoudakis,et al.  Dynamic intermittent Q ‐learning–based model‐free suboptimal co‐design of ‐stabilization , 2019, International Journal of Robust and Nonlinear Control.

[29]  Frank L. Lewis,et al.  Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles , 2012 .

[30]  Eric van Damme,et al.  Non-Cooperative Games , 2000 .

[31]  Yixin Yin,et al.  Leader–Follower Output Synchronization of Linear Heterogeneous Systems With Active Leader Using Reinforcement Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Kun Zhang,et al.  Data-driven adaptive dynamic programming schemes for non-zero-sum games of unknown discrete-time nonlinear systems , 2018, Neurocomputing.

[33]  Z. Gajic,et al.  Parallel Algorithms for Optimal Control of Large Scale Linear Systems , 1993 .

[34]  Xiaohong Cui,et al.  Data-based approximate optimal control for nonzero-sum games of multi-player systems using adaptive dynamic programming , 2018, Neurocomputing.

[35]  Hisham Abou-Kandil,et al.  On global existence of solutions to coupled matrix Riccati equations in closed-loop Nash games , 1996, IEEE Trans. Autom. Control..

[36]  D. Lukes Equilibrium Feedback Control in Linear Games with Quadratic Costs , 1971 .

[37]  Gang Tao Adaptive Control Design and Analysis (Adaptive and Learning Systems for Signal Processing, Communications and Control Series) , 2003 .

[38]  Huaguang Zhang,et al.  Neural-network-based learning algorithms for cooperative games of discrete-time multi-player systems with control constraints via adaptive dynamic programming , 2019, Neurocomputing.

[39]  Qichao Zhang,et al.  Event-Based Robust Control for Uncertain Nonlinear Systems Using Adaptive Dynamic Programming , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[40]  Derong Liu,et al.  Adaptive Dynamic Programming for Discrete-Time Zero-Sum Games , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[41]  Ahmed I. A. Salama,et al.  A Computational Algorithm for Solving a System of Coupled Algebraic Matrix Riccati Equations , 1974, IEEE Transactions on Computers.

[42]  Frank L. Lewis,et al.  H∞ control of linear discrete-time systems: Off-policy reinforcement learning , 2017, Autom..

[43]  Derong Liu,et al.  Integral Reinforcement Learning for Linear Continuous-Time Zero-Sum Games With Completely Unknown Dynamics , 2014, IEEE Transactions on Automation Science and Engineering.

[44]  Frank L. Lewis,et al.  Game Theory-Based Control System Algorithms with Real-Time Reinforcement Learning: How to Solve Multiplayer Games Online , 2017, IEEE Control Systems.

[45]  Y. Ho,et al.  Nonzero-sum differential games , 1969 .

[46]  Yixin Yin,et al.  Data-Driven Robust Control of Discrete-Time Uncertain Linear Systems via Off-Policy Reinforcement Learning , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[47]  Qichao Zhang,et al.  Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics , 2016, IEEE Transactions on Cybernetics.

[48]  Derong Liu,et al.  Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games With Unknown Dynamics , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.