论文信息 - Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach

Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach

Abstract In this paper we propose an online Q-learning algorithm to solve the infinite-horizon optimal control problem of a linear time invariant system with completely uncertain/unknown dynamics. We first formulate the Q-function by using the Hamiltonian and the optimal cost. An integral reinforcement learning approach is used to develop an actor/critic approximator structure to estimate the parameters of the Q-function online while also guaranteeing closed-loop asymptotic stability and convergence to the optimal solution.

Kyriakos G. Vamvoudakis | K. Vamvoudakis

[1] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[2] Frank L. Lewis,et al. Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3] Hao Xu,et al. Stochastic optimal control of unknown linear networked control system in the presence of random delays and packet losses , 2012, Autom..

[4] Sean P. Meyn,et al. Q-learning and Pontryagin's Minimum Principle , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[5] Jae Young Lee,et al. Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems , 2012, Autom..

[6] Hao Xu,et al. Optimal regulation of uncertain dynamic systems using adaptive dynamic programming , 2014, J. Control. Decis..

[7] Petros A. Ioannou,et al. Adaptive control tutorial , 2006, Advances in design and control.

[8] K. Glover,et al. Parameterization and Transient Validation of a Variable Geometry Turbocharger for Mean-Value Modeling at Low and Medium Speed-Load Points , 2002 .

[9] Frank L. Lewis,et al. Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[10] Frank L. Lewis,et al. Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[11] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[12] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[13] Zhong-Ping Jiang,et al. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[14] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[15] Frank L. Lewis,et al. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics , 2014, Autom..

[16] A. Doucet,et al. Two time-scale stochastic approximation for constrained stochastic optimization and constrained Markov decision problems , 2003, Proceedings of the 2003 American Control Conference, 2003..