论文信息 - Data-Driven Nonzero-Sum Game for Discrete-Time Systems Using Off-Policy Reinforcement Learning

Data-Driven Nonzero-Sum Game for Discrete-Time Systems Using Off-Policy Reinforcement Learning

In this paper, we develop a data-driven algorithm to learn the Nash equilibrium solution for a two-player non-zero-sum (NZS) game with completely unknown linear discrete-time dynamics based on off-policy reinforcement learning (RL). This algorithm solves the coupled algebraic Riccati equations (CARE) forward in time in a model-free manner by using the online measured data. We first derive the CARE for solving the two-player NZS game. Then, model-free off-policy RL is developed to obviate the requirement of complete knowledge of system dynamics. Besides, on- and off-policy RL algorithms are compared in terms of the robustness against the probing noise. Finally, a simulation example is presented to show the efficacy of the presented approach.

[1] Frank L. Lewis,et al. Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , 2012 .

[2] Qichao Zhang,et al. Data-Based Reinforcement Learning for Nonzero-Sum Games With Unknown Drift Dynamics , 2019, IEEE Transactions on Cybernetics.

[3] Yixin Yin,et al. Hamiltonian-Driven Adaptive Dynamic Programming for Continuous Nonlinear Dynamical Systems , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[4] Kun Zhang,et al. Iterative adaptive dynamic programming methods with neural network implementation for multi-player zero-sum games , 2018, Neurocomputing.

[5] Yixin Yin,et al. Dynamic Intermittent Feedback Design for $H_{\infty}$ Containment Control on a Directed Graph , 2020, IEEE Transactions on Cybernetics.

[6] Frank L. Lewis,et al. Optimal and Autonomous Control Using Reinforcement Learning: A Survey , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[7] Frank L. Lewis,et al. Multi-agent differential graphical games , 2011, Proceedings of the 30th Chinese Control Conference.

[8] Ning Sun,et al. Antiswing Cargo Transportation of Underactuated Tower Crane Systems by a Nonlinear Controller Embedded With an Integral Term , 2019, IEEE Transactions on Automation Science and Engineering.

[9] Huaguang Zhang,et al. An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games , 2011, Autom..

[10] Frank L. Lewis,et al. $ {H}_{ {\infty }}$ Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[11] Yixin Yin,et al. Containment Control of Heterogeneous Systems With Non-Autonomous Leaders: A Distributed Optimal Model Reference Approach , 2018, IEEE Access.

[12] Chaomin Luo,et al. Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms , 2017, IEEE Transactions on Cybernetics.

[13] Frank L. Lewis,et al. Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[14] Frank L. Lewis,et al. Online solution of nonlinear two-player zero-sum games using synchronous policy iteration , 2010, 49th IEEE Conference on Decision and Control (CDC).

[15] Frank L. Lewis,et al. Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning , 2014, IEEE Transactions on Automatic Control.

[16] Qichao Zhang,et al. Event-Triggered H ∞ Control for Continuous-Time Nonlinear System , 2015, ISNN.

[17] Yixin Yin,et al. Optimal Containment Control of Unknown Heterogeneous Systems With Active Leaders , 2019, IEEE Transactions on Control Systems Technology.

[18] Ming He,et al. Admissible output consensualization control for singular multi-agent systems with time delays , 2016, J. Frankl. Inst..

[19] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20] Frank L. Lewis,et al. Multi-agent discrete-time graphical games and reinforcement learning solutions , 2014, Autom..

[21] Frank L. Lewis,et al. Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations , 2011, Autom..

[22] Haibo He,et al. Novel iterative neural dynamic programming for data-based approximate optimal control design , 2017, Autom..

[23] Huaguang Zhang,et al. Iterative ADP learning algorithms for discrete-time multi-player games , 2018, Artificial Intelligence Review.

[24] Frank L. Lewis,et al. Discrete-Time Deterministic $Q$ -Learning: A Novel Convergence Analysis , 2017, IEEE Transactions on Cybernetics.

[25] M. Bacharach. Two-person Cooperative Games , 1976 .

[26] Yanhong Luo,et al. Data-driven approximate optimal tracking control schemes for unknown non-affine non-linear multi-player systems via adaptive dynamic programming , 2017 .

[27] Randal W. Bea. Successive Galerkin approximation algorithms for nonlinear optimal and robust control , 1998 .

[28] Kyriakos G. Vamvoudakis,et al. Dynamic intermittent Q ‐learning–based model‐free suboptimal co‐design of ‐stabilization , 2019, International Journal of Robust and Nonlinear Control.

[29] Frank L. Lewis,et al. Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles , 2012 .

[30] Eric van Damme,et al. Non-Cooperative Games , 2000 .

[31] Yixin Yin,et al. Leader–Follower Output Synchronization of Linear Heterogeneous Systems With Active Leader Using Reinforcement Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[32] Kun Zhang,et al. Data-driven adaptive dynamic programming schemes for non-zero-sum games of unknown discrete-time nonlinear systems , 2018, Neurocomputing.

[33] Z. Gajic,et al. Parallel Algorithms for Optimal Control of Large Scale Linear Systems , 1993 .

[34] Xiaohong Cui,et al. Data-based approximate optimal control for nonzero-sum games of multi-player systems using adaptive dynamic programming , 2018, Neurocomputing.

[35] Hisham Abou-Kandil,et al. On global existence of solutions to coupled matrix Riccati equations in closed-loop Nash games , 1996, IEEE Trans. Autom. Control..

[36] D. Lukes. Equilibrium Feedback Control in Linear Games with Quadratic Costs , 1971 .

[37] Gang Tao. Adaptive Control Design and Analysis (Adaptive and Learning Systems for Signal Processing, Communications and Control Series) , 2003 .

[38] Huaguang Zhang,et al. Neural-network-based learning algorithms for cooperative games of discrete-time multi-player systems with control constraints via adaptive dynamic programming , 2019, Neurocomputing.

[39] Qichao Zhang,et al. Event-Based Robust Control for Uncertain Nonlinear Systems Using Adaptive Dynamic Programming , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[40] Derong Liu,et al. Adaptive Dynamic Programming for Discrete-Time Zero-Sum Games , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[41] Ahmed I. A. Salama,et al. A Computational Algorithm for Solving a System of Coupled Algebraic Matrix Riccati Equations , 1974, IEEE Transactions on Computers.

[42] Frank L. Lewis,et al. H∞ control of linear discrete-time systems: Off-policy reinforcement learning , 2017, Autom..

[43] Derong Liu,et al. Integral Reinforcement Learning for Linear Continuous-Time Zero-Sum Games With Completely Unknown Dynamics , 2014, IEEE Transactions on Automation Science and Engineering.

[44] Frank L. Lewis,et al. Game Theory-Based Control System Algorithms with Real-Time Reinforcement Learning: How to Solve Multiplayer Games Online , 2017, IEEE Control Systems.

[45] Y. Ho,et al. Nonzero-sum differential games , 1969 .

[46] Yixin Yin,et al. Data-Driven Robust Control of Discrete-Time Uncertain Linear Systems via Off-Policy Reinforcement Learning , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[47] Qichao Zhang,et al. Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics , 2016, IEEE Transactions on Cybernetics.

[48] Derong Liu,et al. Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games With Unknown Dynamics , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.