A novel generalized value iteration scheme for uncertain continuous-time linear systems

In this paper, a novel generalized value iteration (VI) technique is presented which is a reinforcement learning (RL) scheme for solving online the continuous-time (CT) discounted linear quadratic regulation (LQR) problems without exactly knowing the system matrix A. In the proposed method, a discounted value function is considered, which is a general setting in RL frameworks, but not fully considered in RL for CT dynamical systems. Moreover, a stepwise-varying learning rate is introduced for the fast and safe convergence. In relation to this learning rate, we also discuss the locations of the poles of the closed-loop system and monotone convergence to the optimal solution. The results from these discussions give the conditions on the stability and monotone convergence of the existing VI methods.

[1]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[2]  F. Lewis,et al.  Continuous-Time ADP for Linear Systems with Partially Unknown Dynamics , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[3]  B. Anderson,et al.  Linear Optimal Control , 1971 .

[4]  Luigi Fortuna,et al.  Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control , 2009 .

[5]  F. J. Evans Linear optimal control : by B.D.O. Anderson and J.B. Moore. 399 pages, diagrams, 6×9 in. Englewood Cliffs, New Jersey, Prentice–Hall, 1971. Price $14.95 (approx. £5·55). , 1973 .

[6]  Lihua Xie,et al.  Output feedback H∞ control of systems with parameter uncertainty , 1996 .

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Frank L. Lewis,et al.  Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[9]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[10]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[11]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[12]  J.H. Lee,et al.  A reinforcement learning-based scheme for adaptive optimal control of linear stochastic systems , 2008, 2008 American Control Conference.

[13]  George G. Lendaris,et al.  Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[14]  Sean P. Meyn,et al.  Q-learning and Pontryagin's Minimum Principle , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[15]  L. C. Baird,et al.  Reinforcement learning in continuous time: advantage updating , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).