Learning Optimal Control with MPC Layer
暂无分享,去创建一个
This paper explores the potential to combine the Model Predictive Control (MPC) and Reinforcement Learning (RL). This is achieved based on the technique Cvxpy, which explores the differentiable optimization problems and embeds it as a layer in machine learning. As the function approximaters in RL, the MPC problem constructed by Cvxpy is deployed into all frameworks of RL algorithms, including value-based RL, policy gradient, actor-critic RL. We detail the combination method and provide the novel algorithm structure w.r.t some typical RL algorithms. The major advantage of our MPC layer in RL algorithm is flexibility and fast convergent rate. We provide some practical tricks, which contains initial parameter training in advance and derivative computation by Lagrange formula. We use openAI and pytorch to execute some experiments for the new algorithms.
[1] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[2] J. Zico Kolter,et al. OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.
[3] Stephen P. Boyd,et al. Differentiable Convex Optimization Layers , 2019, NeurIPS.
[4] Alberto Bemporad,et al. Practical Reinforcement Learning of Stabilizing Economic MPC , 2019, 2019 18th European Control Conference (ECC).