论文信息 - Learning-based model predictive control for Markov decision processes

Learning-based model predictive control for Markov decision processes

We propose the use of Model Predictive Control (MPC) for controlling systems described by Markov decision processes. First, we consider a straightforward MPC algorithm for Markov decision processes. Then, we propose value functions, a means to deal with issues arising in conventional MPC, e.g., computational requirements and sub-optimality of actions. We use reinforcement learning to let an MPC agent learn a value function incrementally. The agent incorporates experience from the interaction with the system in its decision making. Our approach initially relies on pure MPC. Over time, as experience increases, the learned value function is taken more and more into account. This speeds up the decision making, allows decisions to be made over an infinite instead of a finite horizon, and provides adequate control actions, even if the system and desired performance slowly vary over time. If you want to cite this report, please use the following reference instead: R.R. Negenborn, B. De Schutter, M.A. Wiering, and H. Hellendoorn, “Learning-based model predictive control for Markov decision processes,” Proceedings of the 16th IFAC World Congress, Prague, Czech Republic, 6 pp., July 2005. Paper 2106 / We-M16-TO/2.

[1] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[2] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[3] Jay H. Lee,et al. Model predictive control: past, present and future , 1999 .

[4] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[5] Alberto Bemporad,et al. The explicit linear quadratic regulator for constrained systems , 2003, Autom..

[6] Marco Wiering,et al. Multi-Agent Reinforcement Learning for Traffic Light control , 2000 .

[7] Marco Wiering,et al. Explorations in efficient reinforcement learning , 1999 .

[8] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[9] Jan M. Maciejowski,et al. Predictive control : with constraints , 2002 .

[10] Michael Kearns,et al. Bias-Variance Error Bounds for Temporal Difference Updates , 2000, COLT.

[11] A. Jadbabaie,et al. Stabilizing receding horizon control of nonlinear systems: a control Lyapunov function approach , 1999, Proceedings of the 1999 American Control Conference (Cat. No. 99CH36251).

[12] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[13] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .