A reinforcement learning-based scheme for adaptive optimal control of linear stochastic systems

Reinforcement learning where decision-making agents learn optimal policies through environmental interactions is an attractive paradigm for direct, adaptive controller design. However, results for systems with continuous variables are rare. Here, we generalize a previous work on deterministic linear systems, to stochastic ones, since uncertainty is almost always present and needs to be accounted for to ensure good closed-loop performance. In this work, we present convergence results and also show an example suggesting automatic controller order-reduction. We also highlight key differences between the algorithms for deterministic and stochastic systems.

[1]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[2]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[3]  S.H.G. ten Hagen Continuous State Space Q-Learning for control of Nonlinear Systems , 2001 .

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  S.H.G. ten Hagen,et al.  Linear Quadratic Regulation using reinforcement learning , 1998 .

[6]  Thomas F. Edgar,et al.  Process Dynamics and Control , 1989 .

[7]  Graham C. Goodwin,et al.  Adaptive filtering prediction and control , 1984 .

[8]  Tomas Landelius,et al.  Reinforcement Learning and Distributed Local Model Synthesis , 1997 .

[9]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[10]  David E. Cox,et al.  Iterative LQG Controller Design Through Closed-Loop Identification , 1995 .

[11]  Si-Zhao Joe Qin,et al.  An overview of subspace identification , 2006, Comput. Chem. Eng..

[12]  B. Anderson,et al.  Model reduction for control system design , 1984 .

[13]  F. Lewis,et al.  Model-free Q-learning designs for discrete-time zero-sum games with application to H-infinity control , 2007, 2007 European Control Conference (ECC).

[14]  Steven J. Bradtke,et al.  Incremental dynamic programming for on-line adaptive optimal control , 1995 .

[15]  Journal of Dynamic Systems, Measurement, and Control Guest Editorial Special Issue on Novel Robotics and Control , .

[16]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[17]  Richard S. Sutton,et al.  Reinforcement Learning is Direct Adaptive Optimal Control , 1992, 1991 American Control Conference.

[18]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .