Adaptive Critic Designs for Discrete-Time Zero-Sum Games With Application to $H_{\infty}$ Control

In this correspondence, adaptive critic approximate dynamic programming designs are derived to solve the discrete-time zero-sum game in which the state and action spaces are continuous. This results in a forward-in-time reinforcement learning algorithm that converges to the Nash equilibrium of the corresponding zero-sum game. The results in this correspondence can be thought of as a way to solve the Riccati equation of the well-known discrete-time Hinfin optimal control problem forward in time. Two schemes are presented, namely: 1) a heuristic dynamic programming and 2) a dual-heuristic dynamic programming, to solve for the value function and the costate of the game, respectively. An Hinfin autopilot design for an F-16 aircraft is presented to illustrate the results

[1]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[2]  Bernard Widrow,et al.  Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..

[3]  Joe Brewer,et al.  Kronecker products and matrix calculus in system theory , 1978 .

[4]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[5]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  Frank L. Lewis,et al.  Applied Optimal Control and Estimation , 1992 .

[7]  A. Weeren,et al.  The discrete time Riccati equation related to the H∞ control problem , 1992, 1992 American Control Conference.

[8]  Frank L. Lewis,et al.  Aircraft Control and Simulation , 1992 .

[9]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[10]  A. Weeren,et al.  The discrete-time Riccati equation related to the H∞ control problem , 1994, IEEE Trans. Autom. Control..

[11]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[12]  Tamer Başar,et al.  H1-Optimal Control and Related Minimax Design Problems , 1995 .

[13]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[14]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[15]  Tomas Landelius,et al.  Reinforcement Learning and Distributed Local Model Synthesis , 1997 .

[16]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[17]  S.H.G. ten Hagen,et al.  Linear Quadratic Regulation using reinforcement learning , 1998 .

[18]  Michael L. Littman,et al.  Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[19]  George G. Lendaris,et al.  Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[20]  F. Lewis,et al.  Hamilton-Jacobi-Isaacs formulation for constrained input nonlinear systems , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[21]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[22]  Victor M. Becerra,et al.  Optimal control , 2008, Scholarpedia.