论文信息 - Adaptive linear quadratic control using policy iteration

Adaptive linear quadratic control using policy iteration

In this paper we present the stability and convergence results for dynamic programming-based reinforcement learning applied to linear quadratic regulation (LQR). The specific algorithm we analyze is based on Q-learning and it is proven to converge to an optimal controller provided that the underlying system is controllable and a particular signal vector is persistently excited. This is the first convergence result for DP-based reinforcement learning algorithms for a continuous problem.

[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[2] E. Denardo. CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .

[3] D. Kleinman. On an iterative technique for Riccati equation computations , 1968 .

[4] Graham C. Goodwin,et al. Adaptive filtering prediction and control , 1984 .

[5] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[6] Paul J. Werbos,et al. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[7] C. Watkins. Learning from delayed rewards , 1989 .

[8] Donald A. Sofge,et al. Neural network based process optimization and control , 1990, 29th IEEE Conference on Decision and Control.

[9] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[10] Steven J. Bradtke,et al. Incremental dynamic programming for on-line adaptive optimal control , 1995 .

[11] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.