Learning Q-Function Approximations for Hybrid Control Problems

The main challenge in controlling hybrid systems arises from having to consider an exponential number of sequences of future modes to make good long-term decisions. Model predictive control (MPC) computes a control action through a finite-horizon optimisation problem. A key ingredient in this problem is a terminal cost, to account for the system’s evolution beyond the chosen horizon. A good terminal cost can reduce the horizon length required for good control action and is often tuned empirically by observing performance. We build on the idea of using <inline-formula> <tex-math notation="LaTeX">$N$ </tex-math></inline-formula>-step <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-functions (<inline-formula> <tex-math notation="LaTeX">$\mathcal {Q}^{(N{)}}$ </tex-math></inline-formula>) in the MPC objective to avoid having to choose a terminal cost. We present a formulation incorporating the system dynamics and constraints to approximate the optimal <inline-formula> <tex-math notation="LaTeX">$\mathcal {Q}^{(N{)}}$ </tex-math></inline-formula>-function and algorithms to train the approximation parameters through an exploration of the state space. We test the control policy derived from the trained approximations on two benchmark problems through simulations and observe that our algorithms are able to learn good <inline-formula> <tex-math notation="LaTeX">$\mathcal {Q}^{(N{)}}$ </tex-math></inline-formula>-approximations for hybrid systems with dimensions of practical relevance based on a relatively small data-set. We compare our controller’s performance against that of Hybrid MPC in terms of computation time and closed-loop costs.

[1]  John Lygeros,et al.  Learning solutions to hybrid control problems using Benders cuts , 2020, L4DC.

[2]  E. T. Maddalena,et al.  A Neural Network Architecture to Learn Explicit MPC Controllers from Data , 2019, ArXiv.

[3]  Aaron D. Ames,et al.  Planar multi-contact bipedal walking using hybrid zero dynamics , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Vijay Kumar,et al.  Approximating Explicit Model Predictive Control Using Constrained Neural Networks , 2018, 2018 Annual American Control Conference (ACC).

[5]  Joseph Warrington Learning continuous $Q$-functions using generalized Benders cuts , 2019, 2019 18th European Control Conference (ECC).

[6]  J. L. Guzmán,et al.  Hybrid modeling of a solar-thermal heating facility , 2013 .

[7]  Alberto Bemporad,et al.  Predictive Control for Linear and Hybrid Systems , 2017 .

[8]  Zahra Rahmani,et al.  Fuzzy Predictive Control of a Boiler–Turbine System Based on a Hybrid Model System , 2014 .

[9]  Alberto Bemporad,et al.  An MPC/hybrid system approach to traction control , 2006, IEEE Transactions on Control Systems Technology.

[10]  Alberto Bemporad,et al.  Model predictive control based on linear programming - the explicit solution , 2002, IEEE Transactions on Automatic Control.

[11]  Alberto Bemporad,et al.  Control of systems integrating logic, dynamics, and constraints , 1999, Autom..

[12]  Aaron Klein,et al.  BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.