On integral generalized policy iteration for continuous-time linear quadratic regulations

This paper mathematically analyzes the integral generalized policy iteration (I-GPI) algorithms applied to a class of continuous-time linear quadratic regulation (LQR) problems with the unknown system matrix A. GPI is the general idea of interacting policy evaluation and policy improvement steps of policy iteration (PI), for computing the optimal policy. We first introduce the update horizon @?, and then show that (i) all of the I-GPI methods with the same @? can be considered equivalent and that (ii) the value function approximated in the policy evaluation step monotonically converges to the exact one as @?->~. This reveals the relation between the computational complexity and the update (or time) horizon of I-GPI as well as between I-PI and I-GPI in the limit @?->~. We also provide and discuss two modes of convergence of I-GPI; I-GPI behaves like PI in one mode, and in the other mode, it performs like value iteration for discrete-time LQR and infinitesimal GPI (@?->0). From these results, a new classification of the integral reinforcement learning is formed with respect to @?. Two matrix inequality conditions for stability, the region of local monotone convergence, and data-driven (adaptive) implementation methods are also provided with detailed discussion. Numerical simulations are carried out for verification and further investigations.

[1]  Jo van Nunen,et al.  A set of successive approximation methods for discounted Markovian decision problems , 1976, Math. Methods Oper. Res..

[2]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[3]  Hadi Saadat,et al.  Power System Analysis , 1998 .

[4]  Jae Young Lee,et al.  Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems , 2012, Autom..

[5]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[6]  Jae Young Lee,et al.  On generalized policy iteration for continuous-time linear systems , 2011, IEEE Conference on Decision and Control and European Control Conference.

[7]  Qinglai Wei,et al.  Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming , 2012, Autom..

[8]  Frank L. Lewis,et al.  An Improved Method in Receding Horizon Control with Updating of Terminal Cost Function , 2009 .

[9]  Jae Young Lee,et al.  A novel generalized value iteration scheme for uncertain continuous-time linear systems , 2010, 49th IEEE Conference on Decision and Control (CDC).

[10]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[11]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[12]  G. Hewer An iterative technique for the computation of the steady state gains for the discrete optimal regulator , 1971 .

[13]  Frank L. Lewis,et al.  Discrete-time control algorithms and adaptive intelligent systems designs , 2007 .

[14]  Frank L. Lewis,et al.  Generalized Policy Iteration for continuous-time systems , 2009, 2009 International Joint Conference on Neural Networks.

[15]  A. Weeren,et al.  The discrete-time Riccati equation related to the H∞ control problem , 1994, IEEE Trans. Autom. Control..

[16]  P. Olver Nonlinear Systems , 2013 .

[17]  Lyle Noakes,et al.  Continuous-Time Adaptive Critics , 2007, IEEE Transactions on Neural Networks.

[18]  Approximate dynamic programming for output feedback control , 2010, Proceedings of the 29th Chinese Control Conference.

[19]  M. Puterman,et al.  Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .

[20]  Frank L. Lewis,et al.  A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems , 2013, Autom..

[21]  Tomas Landelius,et al.  Reinforcement Learning and Distributed Local Model Synthesis , 1997 .

[22]  Ekkehard W. Sachs,et al.  Inexact Kleinman-Newton Method for Riccati Equations , 2009, SIAM J. Matrix Anal. Appl..

[23]  Frank L. Lewis,et al.  Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2010, Autom..

[24]  A. Weeren,et al.  The discrete time Riccati equation related to the H∞ control problem , 1992, 1992 American Control Conference.

[25]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[26]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[27]  D. Kleinman On an iterative technique for Riccati equation computations , 1968 .

[28]  Leiba Rodman,et al.  Algebraic Riccati equations , 1995 .

[29]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[30]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.