Simultaneous policy update algorithms for learning the solution of linear continuous-time H∞ state feedback control

It is well known that the H"~ state feedback control problem can be viewed as a two-player zero-sum game and reduced to find a solution of the algebra Riccati equation (ARE). In this paper, we propose a simultaneous policy update algorithm (SPUA) for solving the ARE, and develop offline and online versions. The offline SPUA is a model-based approach, which obtains the solution of the ARE by solving a sequence of Lyapunov equations (LEs). Its convergence is established rigorously by constructing a Newton's sequence for the fixed point equation. The online SPUA is a partially model-free approach, which takes advantage of the thought of reinforcement learning (RL) to learn the solution of the ARE online without requiring the internal system dynamics, wherein both players update their action policies simultaneously. The convergence of the online SPUA is proved by demonstrating that it is mathematically equivalent to the offline SPUA. Finally, by conducting comparative simulation studies on an F-16 aircraft plant and a power system, the results show that both the offline SPUA and the online SPUA can find the solution of the ARE, and achieve much better convergence than the existing methods.

[1]  L. B. Rall A Note on the Convergence of Newton’s Method , 1974 .

[2]  Frank L. Lewis,et al.  Online learning algorithms for differential dynamic games and optimal control , 2011 .

[3]  Randal W. Bea Successive Galerkin approximation algorithms for nonlinear optimal and robust control , 1998 .

[4]  Warren B. Powell,et al.  Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics) , 2007 .

[5]  G. Saridis,et al.  Approximate Solutions to the Time-Invariant Hamilton–Jacobi–Bellman Equation , 1998 .

[6]  George N. Saridis,et al.  An Approximation Theory of Optimal Control for Trainable Manipulators , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[7]  Richard A. Tapia,et al.  The Kantorovich Theorem for Newton's Method , 1971 .

[8]  Frank L. Lewis,et al.  Neurodynamic Programming and Zero-Sum Games for Constrained Control Systems , 2008, IEEE Transactions on Neural Networks.

[9]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[10]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[11]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[12]  David J. N. Limebeer,et al.  Linear Robust Control , 1994 .

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Frank L. Lewis,et al.  Adaptive dynamic programming for online solution of a zero-sum differential game , 2011 .

[15]  Brian D. O. Anderson,et al.  A New Iterative Algorithm to Solve Periodic Riccati Differential Equations With Sign Indefinite Quadratic Terms , 2008, IEEE Transactions on Automatic Control.

[16]  A. Schaft L/sub 2/-gain analysis of nonlinear systems and nonlinear state-feedback H/sub infinity / control , 1992 .

[17]  Vasile Dragan,et al.  A numerical procedure to compute the stabilising solution of game theoretic Riccati equations of stochastic control , 2011, Int. J. Control.

[18]  Warren B. Powell,et al.  “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.

[19]  D. Kleinman On an iterative technique for Riccati equation computations , 1968 .

[20]  Frank L. Lewis,et al.  Aircraft Control and Simulation , 1992 .

[21]  E. Cheney Analysis for Applied Mathematics , 2001 .

[22]  Hsin-Yi Lin,et al.  Self-organizing state aggregation for architecture design of Q-learning , 2011, Inf. Sci..

[23]  Xuesong Wang,et al.  A fuzzy Actor-Critic reinforcement learning network , 2007, Inf. Sci..

[24]  Vasile Dragan,et al.  Computation of the stabilizing solution of game theoretic Riccati equation arising in stochastic H ∞  control problems , 2010, Numerical Algorithms.

[25]  Kao-Shing Hwang,et al.  Induced states in a decision tree constructed by Q-learning , 2012, Inf. Sci..

[26]  Frank L. Lewis,et al.  Online solution of nonlinear two-player zero-sum games using synchronous policy iteration , 2010, 49th IEEE Conference on Decision and Control (CDC).

[27]  Brian D. O. Anderson,et al.  Computing the Positive Stabilizing Solution to Algebraic Riccati Equations With an Indefinite Quadratic Term via a Recursive Method , 2008, IEEE Transactions on Automatic Control.

[28]  TaeChoong Chung,et al.  Hessian matrix distribution for Bayesian policy gradient reinforcement learning , 2011, Inf. Sci..

[29]  D. Bertsekas Approximate policy iteration: a survey and some new methods , 2011 .

[30]  Randal W. Beard,et al.  Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation , 1997, Autom..

[31]  Frank L. Lewis,et al.  Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2010, Autom..

[32]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[33]  L. Kantorovitch The method of successive approximation for functional equations , 1939 .

[34]  Huaguang Zhang,et al.  An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games , 2011, Autom..

[35]  Brian D. O. Anderson,et al.  An iterative algorithm to solve state-perturbed stochastic algebraic Riccati equations in LQ zero-sum games , 2010, Syst. Control. Lett..