论文信息 - Adaptive Critic Designs for Discrete-Time Zero-Sum Games With Application to $H_{\infty}$ Control

Adaptive Critic Designs for Discrete-Time Zero-Sum Games With Application to $H_{\infty}$ Control

In this correspondence, adaptive critic approximate dynamic programming designs are derived to solve the discrete-time zero-sum game in which the state and action spaces are continuous. This results in a forward-in-time reinforcement learning algorithm that converges to the Nash equilibrium of the corresponding zero-sum game. The results in this correspondence can be thought of as a way to solve the Riccati equation of the well-known discrete-time Hinfin optimal control problem forward in time. Two schemes are presented, namely: 1) a heuristic dynamic programming and 2) a dual-heuristic dynamic programming, to solve for the value function and the costate of the game, respectively. An Hinfin autopilot design for an F-16 aircraft is presented to illustrate the results

[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[2] Bernard Widrow,et al. Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..

[3] Joe Brewer,et al. Kronecker products and matrix calculus in system theory , 1978 .

[4] T. Başar,et al. Dynamic Noncooperative Game Theory , 1982 .

[5] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[6] Frank L. Lewis,et al. Applied Optimal Control and Estimation , 1992 .

[7] A. Weeren,et al. The discrete time Riccati equation related to the H∞ control problem , 1992, 1992 American Control Conference.

[8] Frank L. Lewis,et al. Aircraft Control and Simulation , 1992 .

[9] Andrew G. Barto,et al. Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[10] A. Weeren,et al. The discrete-time Riccati equation related to the H∞ control problem , 1994, IEEE Trans. Autom. Control..

[11] Richard S. Sutton,et al. A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[12] Tamer Başar,et al. H1-Optimal Control and Related Minimax Design Problems , 1995 .

[13] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[14] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[15] Tomas Landelius,et al. Reinforcement Learning and Distributed Local Model Synthesis , 1997 .

[16] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[17] S.H.G. ten Hagen,et al. Linear Quadratic Regulation using reinforcement learning , 1998 .

[18] Michael L. Littman,et al. Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[19] George G. Lendaris,et al. Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[20] F. Lewis,et al. Hamilton-Jacobi-Isaacs formulation for constrained input nonlinear systems , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[21] Warren B. Powell,et al. Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[22] Victor M. Becerra,et al. Optimal control , 2008, Scholarpedia.