Stochastic linear quadratic optimal control for model-free discrete-time systems based on Q-learning algorithm

Abstract Solving the stochastic linear quadratic (SLQ) optimal control problem generally needs full information about system dynamics. In this paper, a Q-learning iteration algorithm is adopted to solve the control problem for model-free discrete-time systems. Firstly, the condition of the well-posedness for the SLQ problem is given. In order to solve the SLQ problem, the stochastic problem is transformed into the deterministic one. Secondly, in the iteration process of Q-learning algorithm, the H matrix sequence and control gain matrix sequence are obtained without the knowledge of system parameters, and the convergence proof of two sequences is also given. Lastly, two simulation examples are supplied to explain the effectiveness of the Q-learning algorithm.

[1]  Jia Shang-hui Moore-Penrose Generalized Inverse Matrix and Solution of Linear Equation Group , 2009 .

[2]  Xun Li,et al.  Discrete-time mean-field Stochastic linear-quadratic optimal control problems, II: Infinite horizon case , 2015, Autom..

[3]  Frank L. Lewis,et al.  Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[4]  John B. Moore,et al.  Indefinite Stochastic Linear Quadratic Control and Generalized Differential Riccati Equation , 2002, SIAM J. Control. Optim..

[5]  Qinglai Wei,et al.  Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming , 2012, Autom..

[6]  Weihai Zhang,et al.  Stochastic linear quadratic optimal control with constraint for discrete-time systems , 2014, Appl. Math. Comput..

[7]  Xun Li,et al.  Open-Loop and Closed-Loop Solvabilities for Stochastic Linear Quadratic Optimal Control Problems , 2015, SIAM J. Control. Optim..

[8]  Tingwen Huang,et al.  Model-Free Optimal Tracking Control via Critic-Only Q-Learning , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Xun Li,et al.  Indefinite Mean-Field Stochastic Linear-Quadratic Optimal Control: From Finite Horizon to Infinite Horizon , 2015, IEEE Transactions on Automatic Control.

[10]  J. Yong,et al.  Stochastic Linear Quadratic Optimal Control Problems , 2001 .

[11]  Huaguang Zhang,et al.  Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.

[12]  George G. Lendaris,et al.  Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[13]  Huaguang Zhang,et al.  Asymptotic tracking control scheme for mechanical systems with external disturbances and friction , 2010, Neurocomputing.

[14]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[15]  Hanqing Jin,et al.  Time-Inconsistent Stochastic Linear–Quadratic Control: Characterization and Uniqueness of Equilibrium , 2015, SIAM J. Control. Optim..

[16]  Hongjing Liang,et al.  Cooperative robust containment control for general discrete-time multi-agent systems with external disturbance , 2017 .

[17]  Shaocheng Tong,et al.  Data-based adaptive neural network optimal output feedback control for nonlinear systems with actuator saturation , 2017, Neurocomputing.

[18]  Frank L. Lewis,et al.  Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Hermann Mena,et al.  Numerical solution of the finite horizon stochastic linear quadratic control problem , 2017, Numer. Linear Algebra Appl..

[20]  Huaguang Zhang,et al.  Neural-Network-Based Constrained Optimal Control Scheme for Discrete-Time Switched Nonlinear System Using Dual Heuristic Programming , 2014, IEEE Transactions on Automation Science and Engineering.

[21]  Qiuye Sun,et al.  Adaptive dynamic programming-based optimal control of unknown nonaffine nonlinear discrete-time systems with proof of convergence , 2012, Neurocomputing.

[22]  Xun Li,et al.  Discrete time mean-field stochastic linear-quadratic optimal control problems , 2013, Autom..

[23]  Huaguang Zhang,et al.  Infinite-time stochastic linear quadratic optimal control for unknown discrete-time systems using adaptive dynamic programming approach , 2016, Neurocomputing.

[24]  Xun Yu Zhou,et al.  Linear matrix inequalities, Riccati equations, and indefinite stochastic linear quadratic controls , 2000, IEEE Trans. Autom. Control..

[25]  J. Yong Linear-Quadratic Optimal Control Problems for Mean-Field Stochastic Differential Equations --- Time-Consistent Solutions , 2013, 1304.3964.

[26]  X. Chen,et al.  Discrete-time Indefinite LQ Control with State and Control Dependent Noises , 2002, J. Glob. Optim..

[27]  W. Wonham On a Matrix Riccati Equation of Stochastic Control , 1968 .

[28]  Jiongmin Yong,et al.  Stochastic Linear Quadratic Optimal Control Problems in Infinite Horizon , 2016, Applied Mathematics & Optimization.