论文信息 - Reinforcement Learning Applied to Linear Quadratic Regulation

Reinforcement Learning Applied to Linear Quadratic Regulation

Recent research on reinforcement learning has focused on algorithms based on the principles of Dynamic Programming (DP). One of the most promising areas of application for these algorithms is the control of dynamical systems, and some impressive results have been achieved. However, there are significant gaps between practice and theory. In particular, there are no convergence proofs for problems with continuous state and action spaces, or for systems involving non-linear function approximators (such as multilayer perceptrons). This paper presents research applying DP-based reinforcement learning theory to Linear Quadratic Regulation (LQR), an important class of control problems involving continuous state and action spaces and requiring a simple type of non-linear function approximator. We describe an algorithm based on Q-learning that is proven to converge to the optimal controller for a large class of LQR problems. We also describe a slightly different algorithm that is only locally convergent to the optimal Q-function, demonstrating one of the possible pitfalls of using a non-linear function approximator with DP-based learning.

Steven J. Bradtke | S. Bradtke

[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[2] D. Kleinman. On an iterative technique for Riccati equation computations , 1968 .

[3] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[4] Paul J. Werbos,et al. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[5] Michael I. Jordan,et al. Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.

[6] Donald A. Sofge,et al. Neural network based process optimization and control , 1990, 29th IEEE Conference on Decision and Control.