Stochastic Linear Quadratic Optimal Control Problem: A Reinforcement Learning Method.

This paper studies an infinite horizon stochastic linear quadratic (LQ) problem by reinforcement learning method. An online algorithm, based on Bellman dynamic programming principle, is presented to obtain the optimal control. This algorithm does not require all the knowledge of the internal structure of the systems. It updates feedback to evaluate the reinforcement signals until the control does not improve the cost functionals any more, which are under the direct adaptive optimal control schemes. The implementation is carried out and a numerical example is provided to illustrate our theoretical results.

[1]  Thaleia Zariphopoulou,et al.  Exploration versus Exploitation in Reinforcement Learning: A Stochastic Control Approach , 2018, SSRN Electronic Journal.

[2]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[3]  X. Zhou,et al.  Stochastic Controls: Hamiltonian Systems and HJB Equations , 1999 .

[4]  E. Ostertag Linear Matrix Inequalities , 2011 .

[5]  Jiongmin Yong,et al.  Stochastic Linear Quadratic Optimal Control Problems in Infinite Horizon , 2016, Applied Mathematics & Optimization.

[6]  George G. Lendaris,et al.  Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[7]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[8]  J. Yong,et al.  A Linear-Quadratic Optimal Control Problem for Mean-Field Stochastic Differential Equations in Infinite Horizon , 2012, 1208.5308.

[9]  Xun Yu Zhou,et al.  Linear matrix inequalities, Riccati equations, and indefinite stochastic linear quadratic controls , 2000, IEEE Trans. Autom. Control..

[10]  L. C. Baird,et al.  Reinforcement learning in continuous time: advantage updating , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[11]  Wee Chin Wong,et al.  A reinforcement learning‐based scheme for direct adaptive optimal control of linear stochastic systems , 2010 .

[12]  Xun Yu Zhou,et al.  Continuous‐time mean–variance portfolio selection: A reinforcement learning framework , 2019, Mathematical Finance.

[13]  R.J. Williams,et al.  Reinforcement learning is direct adaptive optimal control , 1991, IEEE Control Systems.

[14]  Lei Guo,et al.  Adaptive continuous-time linear quadratic Gaussian control , 1999, IEEE Trans. Autom. Control..