论文信息 - A reinforcement learning-based scheme for adaptive optimal control of linear stochastic systems

A reinforcement learning-based scheme for adaptive optimal control of linear stochastic systems

Reinforcement learning where decision-making agents learn optimal policies through environmental interactions is an attractive paradigm for direct, adaptive controller design. However, results for systems with continuous variables are rare. Here, we generalize a previous work on deterministic linear systems, to stochastic ones, since uncertainty is almost always present and needs to be accounted for to ensure good closed-loop performance. In this work, we present convergence results and also show an example suggesting automatic controller order-reduction. We also highlight key differences between the algorithms for deterministic and stochastic systems.

J.H. Lee | Wee Chin Wong | Jay H. Lee

[1] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[2] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[3] S.H.G. ten Hagen. Continuous State Space Q-Learning for control of Nonlinear Systems , 2001 .

[4] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5] S.H.G. ten Hagen,et al. Linear Quadratic Regulation using reinforcement learning , 1998 .

[6] Thomas F. Edgar,et al. Process Dynamics and Control , 1989 .

[7] Graham C. Goodwin,et al. Adaptive filtering prediction and control , 1984 .

[8] Tomas Landelius,et al. Reinforcement Learning and Distributed Local Model Synthesis , 1997 .

[9] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[10] David E. Cox,et al. Iterative LQG Controller Design Through Closed-Loop Identification , 1995 .

[11] Si-Zhao Joe Qin,et al. An overview of subspace identification , 2006, Comput. Chem. Eng..

[12] B. Anderson,et al. Model reduction for control system design , 1984 .

[13] F. Lewis,et al. Model-free Q-learning designs for discrete-time zero-sum games with application to H-infinity control , 2007, 2007 European Control Conference (ECC).

[14] Steven J. Bradtke,et al. Incremental dynamic programming for on-line adaptive optimal control , 1995 .

[15] Journal of Dynamic Systems, Measurement, and Control Guest Editorial Special Issue on Novel Robotics and Control , .

[16] Andrew G. Barto,et al. Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[17] Richard S. Sutton,et al. Reinforcement Learning is Direct Adaptive Optimal Control , 1992, 1991 American Control Conference.

[18] Lennart Ljung,et al. System Identification: Theory for the User , 1987 .