MPC-based Reinforcement Learning for Economic Problems with Application to Battery Storage

In this paper, we are interested in optimal control problems with purely economic costs, which often yield optimal policies having a (nearly) bang-bang structure. We focus on policy approximations based on Model Predictive Control (MPC) and the use of the deterministic policy gradient method to optimize the MPC closed-loop performance in the presence of unmodelled stochasticity or model error. When the policy has a (nearly) bang-bang structure, we observe that the policy gradient method can struggle to produce meaningful steps in the policy parameters. To tackle this issue, we propose a homotopy strategy based on the interior-point method, providing a relaxation of the policy during the learning. We investigate a specific well-known battery storage problem, and show that the proposed method delivers a homogeneous and faster learning than a classical policy gradient approach.

[1]  R. Sarpong,et al.  Bio-inspired synthesis of xishacorenes A, B, and C, and a new congener from fuscol† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc02572c , 2019, Chemical science.

[2]  Mario Zanon,et al.  Reinforcement Learning for Mixed-Integer Problems Based on MPC , 2020, ArXiv.

[3]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[4]  Warren B. Powell,et al.  Tutorial on Stochastic Optimization in Energy—Part I: Modeling and Policies , 2016, IEEE Transactions on Power Systems.

[5]  Lorenz T. Biegler,et al.  Nonlinear Waves in Integrable and Nonintegrable Systems , 2018 .

[6]  Sebastien Gros,et al.  Reinforcement Learning based on MPC/MHE for Unmodeled and Partially Observable Dynamics , 2021, 2021 American Control Conference (ACC).

[7]  Torsten Koller,et al.  Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning , 2019, ArXiv.

[8]  Moritz Diehl,et al.  Using Probabilistic Forecasts in Stochastic Optimization , 2020, 2020 International Conference on Probabilistic Methods Applied to Power Systems (PMAPS).

[9]  Mario Zanon,et al.  Data-Driven Economic NMPC Using Reinforcement Learning , 2019, IEEE Transactions on Automatic Control.

[10]  Moritz Diehl,et al.  Stochastic model predictive control of photovoltaic battery systems using a probabilistic forecast model , 2020, Eur. J. Control.

[11]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[12]  D. Bertsekas Reinforcement Learning and Optimal ControlA Selective Overview , 2018 .

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Anastasios M. Lekkas,et al.  Reinforcement Learning based on Scenario-tree MPC for ASVs , 2021, 2021 American Control Conference (ACC).

[15]  W. Hager,et al.  and s , 2019, Shallow Water Hydraulics.

[16]  M. Dahleh,et al.  Optimal Management and Sizing of Energy Storage Under Dynamic Pricing for the Efficient Integration of Renewable Energy , 2015, IEEE Transactions on Power Systems.

[17]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[18]  George Weiss,et al.  Optimal energy management for grid‐connected storage systems , 2015 .