Tailored neural networks for learning optimal value functions in MPC

Learning-based predictive control is a promising alternative to optimization-based MPC. However, efficiently learning the optimal control policy, the optimal value function, or the Q-function requires suitable function approximators. Often, artificial neural networks (ANN) are considered but choosing a suitable topology is also non-trivial. Against this background, it has recently been shown that tailored ANN allow, in principle, to exactly describe the optimal control policy in linear MPC by exploiting its piecewise affine structure. In this paper, we provide a similar result for representing the optimal value function and the Q-function that are both known to be piecewise quadratic for linear MPC.

[1]  Vijay Kumar,et al.  Approximating Explicit Model Predictive Control Using Constrained Neural Networks , 2018, 2018 Annual American Control Conference (ACC).

[2]  Mark Cannon,et al.  Some observations on the activity of terminal constraints in linear MPC , 2016, 2016 European Control Conference (ECC).

[3]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[4]  Benjamin Karg,et al.  Efficient Representation and Approximation of Model Predictive Control Laws via Deep Learning , 2018, IEEE Transactions on Cybernetics.

[5]  Byron Boots,et al.  Blending MPC & Value Function Approximation for Efficient Reinforcement Learning , 2020, ArXiv.

[6]  Raman Arora,et al.  Understanding Deep Neural Networks with Rectified Linear Units , 2016, Electron. Colloquium Comput. Complex..

[7]  David Q. Mayne,et al.  Constrained model predictive control: Stability and optimality , 2000, Autom..

[8]  Ge Wang,et al.  Universal Approximation with Quadratic Deep Networks , 2018, Neural Networks.

[9]  Yuval Tassa,et al.  Value function approximation and model predictive control , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[10]  Boris Hanin,et al.  Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations , 2017, Mathematics.

[11]  D. Bertsekas Reinforcement Learning and Optimal ControlA Selective Overview , 2018 .

[12]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[13]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[14]  Manfred Morari,et al.  Learning a feasible and stabilizing explicit model predictive control law by robust optimization , 2011, IEEE Conference on Decision and Control and European Control Conference.

[15]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[16]  Chi-Sing Leung,et al.  Rotational quadratic function neural networks , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[17]  Alberto Bemporad,et al.  The explicit linear quadratic regulator for constrained systems , 2003, Autom..

[18]  Moritz Schulze Darup,et al.  Exact representation of piecewise affine functions via neural networks , 2020, 2020 European Control Conference (ECC).