Optimization of the Model Predictive Control Update Interval Using Reinforcement Learning

In control applications there is often a compromise that needs to be made with regards to the complexity and performance of the controller and the computational resources that are available. For instance, the typical hardware platform in embedded control applications is a microcontroller with limited memory and processing power, and for battery powered applications the control system can account for a significant portion of the energy consumption. We propose a controller architecture in which the computational cost is explicitly optimized along with the control objective. This is achieved by a three-part architecture where a high-level, computationally expensive controller generates plans, which a computationally simpler controller executes by compensating for prediction errors, while a recomputation policy decides when the plan should be recomputed. In this paper, we employ model predictive control (MPC) as the high-level plan-generating controller, a linear state feedback controller as the simpler compensating controller, and reinforcement learning (RL) to learn the recomputation policy. Simulation results for two examples showcase the architecture's ability to improve upon the MPC approach and find reasonable compromises weighing the performance on the control objective and the computational resources expended.

[1]  D. Q. Mayne,et al.  Suboptimal model predictive control (feasibility implies stability) , 1999, IEEE Trans. Autom. Control..

[2]  Sebastian Engell,et al.  Rapid development of modular and sustainable nonlinear model predictive control solutions , 2017 .

[3]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[4]  Wpmh Maurice Heemels,et al.  Self-triggered MPC for constrained linear systems and quadratic costs , 2012 .

[5]  Alberto Bemporad,et al.  The explicit linear quadratic regulator for constrained systems , 2003, Autom..

[6]  Frank Allgöwer,et al.  A Simple Semi-explicit MPC Algorithm , 2015 .

[7]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[8]  Mario Zanon,et al.  Safe Reinforcement Learning Using Robust MPC , 2019, IEEE Transactions on Automatic Control.

[9]  Derong Liu,et al.  Event-Triggered Optimal Neuro-Controller Design With Reinforcement Learning for Unknown Nonlinear Systems , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[10]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[11]  W. P. M. H. Heemels,et al.  Analysis of event-driven controllers for linear systems , 2008, Int. J. Control.

[12]  K. Åström,et al.  Comparison of Riemann and Lebesgue sampling for first order stochastic systems , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[13]  Eyal Dassau,et al.  Tackling problem nonlinearities & delays via asymmetric, state-dependent objective costs in MPC of an artificial pancreas. , 2015, Proceedings of the IFAC World Congress. International Federation of Automatic Control. World Congress.

[14]  S. Shankar Sastry,et al.  Provably safe and robust learning-based model predictive control , 2011, Autom..

[15]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[16]  Tor Arne Johansen,et al.  Approximate Explicit Nonlinear Receding Horizon Control for Decompression of Divers , 2012, IEEE Transactions on Control Systems Technology.

[17]  Moritz Diehl,et al.  CasADi: a software framework for nonlinear optimization and optimal control , 2018, Mathematical Programming Computation.

[18]  Karl Henrik Johansson,et al.  Event-Triggered Model Predictive Control With a Statistical Learning , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[19]  D. Mayne,et al.  Robust receding horizon control of constrained nonlinear systems , 1993, IEEE Trans. Autom. Control..

[20]  Masayuki Fujita,et al.  Event-predictive control for energy saving of wireless networked control system , 2009, 2009 American Control Conference.

[21]  WächterAndreas,et al.  On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming , 2006 .

[22]  Stephen J. Wright,et al.  Nonlinear Predictive Control and Moving Horizon Estimation — An Introductory Overview , 1999 .

[23]  Alberto Bemporad,et al.  Safe Reinforcement Learning via Projection on a Safe Set: How to Achieve Optimality? , 2020, ArXiv.

[24]  M. Morari,et al.  Move blocking strategies in receding horizon control , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[25]  Lorenz T. Biegler,et al.  On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming , 2006, Math. Program..

[26]  Sebastian Trimpe,et al.  Deep Reinforcement Learning for Event-Triggered Control , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[27]  Eyal Dassau,et al.  Event-Triggered Model Predictive Control for Embedded Artificial Pancreas Systems , 2018, IEEE Transactions on Biomedical Engineering.

[28]  Huiping Li,et al.  Event-triggered robust model predictive control of continuous-time nonlinear systems , 2014, Autom..

[29]  Dirk Pesch,et al.  Machine Learning in Event-Triggered Control: Recent Advances and Open Issues , 2020, ArXiv.

[30]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[31]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[32]  Jaime F. Fisac,et al.  A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems , 2017, IEEE Transactions on Automatic Control.