Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach

Abstract In this paper we propose an online Q-learning algorithm to solve the infinite-horizon optimal control problem of a linear time invariant system with completely uncertain/unknown dynamics. We first formulate the Q-function by using the Hamiltonian and the optimal cost. An integral reinforcement learning approach is used to develop an actor/critic approximator structure to estimate the parameters of the Q-function online while also guaranteeing closed-loop asymptotic stability and convergence to the optimal solution.

[1]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[2]  Frank L. Lewis,et al.  Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3]  Hao Xu,et al.  Stochastic optimal control of unknown linear networked control system in the presence of random delays and packet losses , 2012, Autom..

[4]  Sean P. Meyn,et al.  Q-learning and Pontryagin's Minimum Principle , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[5]  Jae Young Lee,et al.  Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems , 2012, Autom..

[6]  Hao Xu,et al.  Optimal regulation of uncertain dynamic systems using adaptive dynamic programming , 2014, J. Control. Decis..

[7]  Petros A. Ioannou,et al.  Adaptive control tutorial , 2006, Advances in design and control.

[8]  K. Glover,et al.  Parameterization and Transient Validation of a Variable Geometry Turbocharger for Mean-Value Modeling at Low and Medium Speed-Load Points , 2002 .

[9]  Frank L. Lewis,et al.  Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[10]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[11]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[12]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[13]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[14]  Steven J. Bradtke,et al.  Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[15]  Frank L. Lewis,et al.  Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics , 2014, Autom..

[16]  A. Doucet,et al.  Two time-scale stochastic approximation for constrained stochastic optimization and constrained Markov decision problems , 2003, Proceedings of the 2003 American Control Conference, 2003..

[17]  Sean P. Meyn,et al.  An analysis of reinforcement learning with function approximation , 2008, ICML '08.

[18]  John N. Tsitsiklis,et al.  Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[19]  Warren B. Powell,et al.  Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics) , 2007 .

[20]  F. Lewis,et al.  Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[21]  V. Borkar Stochastic approximation with two time scales , 1997 .

[22]  R. Srikant,et al.  Error bounds for constant step-size Q-learning , 2012, Syst. Control. Lett..

[23]  Keith Glover,et al.  COMPARISON OF UNCERTAINTY PARAMETERISATIONS FOR H ∞ ROBUST CONTROL OF TURBOCHARGED DIESEL ENGINES , 2005 .

[24]  Derong Liu,et al.  Adaptive Dynamic Programming for Control , 2012 .

[25]  Frank L. Lewis,et al.  Continuous-Time Q-Learning for Infinite-Horizon Discounted Cost Linear Quadratic Regulator Problems , 2015, IEEE Transactions on Cybernetics.