Learning Optimal Control with MPC Layer

This paper explores the potential to combine the Model Predictive Control (MPC) and Reinforcement Learning (RL). This is achieved based on the technique Cvxpy, which explores the differentiable optimization problems and embeds it as a layer in machine learning. As the function approximaters in RL, the MPC problem constructed by Cvxpy is deployed into all frameworks of RL algorithms, including value-based RL, policy gradient, actor-critic RL. We detail the combination method and provide the novel algorithm structure w.r.t some typical RL algorithms. The major advantage of our MPC layer in RL algorithm is flexibility and fast convergent rate. We provide some practical tricks, which contains initial parameter training in advance and derivative computation by Lagrange formula. We use openAI and pytorch to execute some experiments for the new algorithms.