Adaptive Dynamic Programming for Discrete-Time Zero-Sum Games

In this paper, a novel adaptive dynamic programming (ADP) algorithm, called “iterative zero-sum ADP algorithm,” is developed to solve infinite-horizon discrete-time two-player zero-sum games of nonlinear systems. The present iterative zero-sum ADP algorithm permits arbitrary positive semidefinite functions to initialize the upper and lower iterations. A novel convergence analysis is developed to guarantee the upper and lower iterative value functions to converge to the upper and lower optimums, respectively. When the saddle-point equilibrium exists, it is emphasized that both the upper and lower iterative value functions are proved to converge to the optimal solution of the zero-sum game, where the existence criteria of the saddle-point equilibrium are not required. If the saddle-point equilibrium does not exist, the upper and lower optimal performance index functions are obtained, respectively, where the upper and lower performance index functions are proved to be not equivalent. Finally, simulation results and comparisons are shown to illustrate the performance of the present method.

[1]  Bin Jiang,et al.  Online Adaptive Policy Learning Algorithm for $H_{\infty }$ State Feedback Control of Unknown Affine Nonlinear Discrete-Time Systems , 2014, IEEE Transactions on Cybernetics.

[2]  Derong Liu,et al.  Model-Free Adaptive Dynamic Programming for Optimal Control of Discrete-Time Ane Nonlinear System , 2014 .

[3]  Huaguang Zhang,et al.  A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[4]  T. Basar,et al.  H∞-0ptimal Control and Related Minimax Design Problems: A Dynamic Game Approach , 1996, IEEE Trans. Autom. Control..

[5]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[6]  Sarangapani Jagannathan,et al.  Optimal Control of Nonlinear Continuous-Time Systems in Strict-Feedback Form , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Yu Jiang,et al.  Robust Adaptive Dynamic Programming and Feedback Stabilization of Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Xin Zhang,et al.  Data-Driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method , 2011, IEEE Transactions on Neural Networks.

[9]  Jacob Engwerda,et al.  Uniqueness conditions for the affine open-loop linear quadratic differential game , 2008, Autom..

[10]  Rafal Goebel,et al.  Convexity in zero-sum differential games , 2001, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[11]  Derong Liu,et al.  Numerical adaptive learning control scheme for discrete-time non-linear systems , 2013 .

[12]  Ali Heydari,et al.  Feedback Solution to Optimal Switching Problems With Switching Cost , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Derong Liu,et al.  Multibattery Optimal Coordination Control for Home Energy Management Systems via Distributed Iterative Adaptive Dynamic Programming , 2015, IEEE Transactions on Industrial Electronics.

[14]  Frank L. Lewis,et al.  Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Qichao Zhang,et al.  Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics , 2016, IEEE Transactions on Cybernetics.

[16]  Jun Fu,et al.  Robust Adaptive Dynamic Programming of Two-Player Zero-Sum Games for Continuous-Time Linear Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[17]  F. Lewis,et al.  Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[18]  Derong Liu,et al.  Policy Iteration Algorithm for Online Design of Robust Control for a Class of Continuous-Time Nonlinear Systems , 2014, IEEE Transactions on Automation Science and Engineering.

[19]  Haibo He,et al.  Model-Free Dual Heuristic Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Frank L. Lewis,et al.  Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances , 2016, IEEE Transactions on Cybernetics.

[21]  Bo Lincoln,et al.  Relaxing dynamic programming , 2006, IEEE Transactions on Automatic Control.

[22]  Mariesa L. Crow,et al.  Zero-sum two-player game theoretic formulation of affine nonlinear discrete-time systems using neural networks , 2010, IJCNN.

[23]  Derong Liu,et al.  Value Iteration Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems , 2016, IEEE Transactions on Cybernetics.

[24]  MengChu Zhou,et al.  Model Predictive Control of Central Chiller Plant With Thermal Energy Storage Via Dynamic Programming and Mixed-Integer Linear Programming , 2015, IEEE Transactions on Automation Science and Engineering.

[25]  Suman Chakravorty,et al.  Information Space Receding Horizon Control for Multisensor Tasking Problems , 2016, IEEE Transactions on Cybernetics.

[26]  Derong Liu,et al.  Finite-Approximation-Error-Based Discrete-Time Iterative Adaptive Dynamic Programming , 2014, IEEE Transactions on Cybernetics.

[27]  Ali Heydari,et al.  Revisiting Approximate Dynamic Programming and its Convergence , 2014, IEEE Transactions on Cybernetics.

[28]  Haibo He,et al.  GrDHP: A General Utility Function Representation for Dual Heuristic Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Hua Xu,et al.  Linear-quadratic zero-sum differential games for generalized state space systems , 1994, IEEE Trans. Autom. Control..

[30]  Qinglai Wei,et al.  Data-Driven Zero-Sum Neuro-Optimal Control for a Class of Continuous-Time Unknown Nonlinear Systems With Disturbance Using ADP , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Derong Liu,et al.  A Novel Dual Iterative $Q$-Learning Method for Optimal Battery Management in Smart Residential Environments , 2015, IEEE Transactions on Industrial Electronics.

[32]  Frank L. Lewis,et al.  Adaptive Critic Designs for Discrete-Time Zero-Sum Games With Application to $H_{\infty}$ Control , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[33]  Jinwu Gao,et al.  Linear–Quadratic Uncertain Differential Game With Application to Resource Extraction Problem , 2016, IEEE Transactions on Fuzzy Systems.

[34]  A. Rantzer Relaxed dynamic programming in switching systems , 2006 .

[35]  Maarten Steinbuch,et al.  Implementation of Dynamic Programming for Optimal Control Problems With Continuous States , 2015, IEEE Transactions on Control Systems Technology.

[36]  Paul J. Webros A menu of designs for reinforcement learning over time , 1990 .

[37]  Frank L. Lewis,et al.  Multi-agent differential graphical games , 2011, Proceedings of the 30th Chinese Control Conference.

[38]  Frank L. Lewis,et al.  Actor–Critic-Based Optimal Tracking for Partially Unknown Nonlinear Discrete-Time Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[39]  Jennie Si,et al.  Online learning control by association and reinforcement , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[40]  Frank L. Lewis,et al.  Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Convergence Analysis , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[41]  A. Schaft L/sub 2/-gain analysis of nonlinear systems and nonlinear state-feedback H/sub infinity / control , 1992 .

[42]  Derong Liu,et al.  Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games With Unknown Dynamics , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[43]  Frank L. Lewis,et al.  Multiple Actor-Critic Structures for Continuous-Time Optimal Control Using Input-Output Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[44]  Ali Heydari,et al.  Fixed-final-time optimal tracking control of input-affine nonlinear systems , 2014, Neurocomputing.

[45]  Huaguang Zhang,et al.  An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games , 2011, Autom..

[46]  Tianyou Chai,et al.  Online Solution of Two-Player Zero-Sum Games for Continuous-Time Nonlinear Systems With Completely Unknown Dynamics , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[47]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[48]  Qinglai Wei,et al.  A Novel Iterative $\theta $-Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Automation Science and Engineering.

[49]  Frank L. Lewis,et al.  Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[50]  D. Bertsekas 6.253 Convex Analysis and Optimization, Spring 2010 , 2004 .

[51]  Huaguang Zhang,et al.  Adaptive Dynamic Programming for a Class of Complex-Valued Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[52]  Dimitri P. Bertsekas,et al.  Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[53]  Huaguang Zhang,et al.  Near-Optimal Control for Nonzero-Sum Differential Games of Continuous-Time Nonlinear Systems Using Single-Network ADP , 2013, IEEE Transactions on Cybernetics.

[54]  Frank L. Lewis,et al.  Mixed Iterative Adaptive Dynamic Programming for Optimal Battery Energy Control in Smart Residential Microgrids , 2017, IEEE Transactions on Industrial Electronics.

[55]  Derong Liu,et al.  Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm , 2013, Neurocomputing.

[56]  Derong Liu,et al.  Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Admissibility and Termination Analysis , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[57]  Derong Liu,et al.  Adaptive Dynamic Programming for Optimal Tracking Control of Unknown Nonlinear Systems With Application to Coal Gasification , 2014, IEEE Transactions on Automation Science and Engineering.

[58]  Frank L. Lewis,et al.  Neurodynamic Programming and Zero-Sum Games for Constrained Control Systems , 2008, IEEE Transactions on Neural Networks.

[59]  Frank L. Lewis,et al.  Discrete-Time Deterministic $Q$ -Learning: A Novel Convergence Analysis , 2017, IEEE Transactions on Cybernetics.

[60]  Derong Liu,et al.  Discrete-Time Optimal Control via Local Policy Iteration Adaptive Dynamic Programming , 2017, IEEE Transactions on Cybernetics.

[61]  Derong Liu,et al.  Data-Driven Neuro-Optimal Temperature Control of Water–Gas Shift Reaction Using Stable Iterative Adaptive Dynamic Programming , 2014, IEEE Transactions on Industrial Electronics.

[62]  Zhong-Ping Jiang,et al.  Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems , 2013, IEEE Transactions on Automatic Control.

[63]  Derong Liu,et al.  Optimal control for discrete-time affine non-linear systems using general value iteration , 2012 .