Experience-based model predictive control using reinforcement learning

Model predictive control (MPC) is becoming an increasingly popular method to select actions for controlling dynamic systems. TraditionallyMPC uses a model of the system to be controlled and a performance function to characterize the desired behavior of the system. The MPC agent finds actions over a finite horizon that lead the system into a desired direction. A significant problem with conventional MPC is the amount of computations required and suboptimality of chosen actions. In this paper we propose the use of MPC to control systems that can be described as Markov decision processes. We discuss how a straightforward MPC algorithm for Markov decision processes can be implemented, and how it can be improved in terms of speed and decision quality by considering value functions. We propose the use of reinforcement learning techniques to let the agent incorporate experience from the interaction with the system in its decision making. This experience speeds up the decision making of the agent significantly. Also, it allows the agent to base its decisions on an infinite instead of finite horizon. The proposed approach can be beneficial for any system that can be modeled as Markov decision process, including systems found in areas like logistics, traffic control, and vehicle automation.

[1]  R. Bellman Dynamic programming. , 1957, Science.

[2]  Alberto Bemporad,et al.  The explicit linear quadratic regulator for constrained systems , 2003, Autom..

[3]  Marco Wiering,et al.  Multi-Agent Reinforcement Learning for Traffic Light control , 2000 .

[4]  Martin L. Puterman,et al.  Coffee, Tea, or ...?: A Markov Decision Process Model for Airline Meal Provisioning , 2004, Transp. Sci..

[5]  B. De Schutter,et al.  Integrated model predictive control for mixed urban and freeway networks , 2004 .

[6]  Andras Hegyi,et al.  Model predictive control for integrating traffic control measures , 2004 .

[7]  William B. Dunbar,et al.  Model predictive control of coordinated multi-vehicle formations , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[8]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[9]  Manfred Morari,et al.  Model predictive control: Theory and practice , 1988 .

[10]  Jan M. Maciejowski,et al.  Predictive control : with constraints , 2002 .

[11]  Manfred Morari,et al.  Model predictive control: Theory and practice - A survey , 1989, Autom..

[12]  A. Jadbabaie,et al.  Stabilizing receding horizon control of nonlinear systems: a control Lyapunov function approach , 1999, Proceedings of the 1999 American Control Conference (Cat. No. 99CH36251).

[13]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[14]  Eduardo F. Camacho,et al.  Model predictive control in the process industry , 1995 .

[15]  Steve Rogers,et al.  Model predictive control in the process industry , 1996 .

[16]  Marco Wiering,et al.  Explorations in efficient reinforcement learning , 1999 .

[17]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[18]  Tom Bellemans,et al.  Model predictive control for ramp metering of motorway traffic: A case study , 2006 .

[19]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[20]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.