MARLISA: Multi-Agent Reinforcement Learning with Iterative Sequential Action Selection for Load Shaping of Grid-Interactive Connected Buildings

We demonstrate that multi-agent reinforcement learning (RL) controllers can cooperate to provide more effective load shaping in a model-free, decentralized, and scalable way with very limited sharing of anonymous information. Rapid urbanization, increasing electrification, the integration of renewable energy resources, and the potential shift towards electric vehicles create new challenges for the planning and control of energy systems in smart cities. Energy storage resources can help better align peaks of renewable energy generation with peaks of electricity consumption and flatten the curve of electricity demand. Model-based controllers, such as MPC, require developing models of the systems controlled, which is often not cost-effective or scalable. Model-free controllers, such as RL, have the potential to provide good control policies cost-effectively and leverage the use of historical data for training. However, it is unclear how RL algorithms can control a multitude of energy systems in a scalable coordinated way. In this paper, we introduce MARLISA, a controller that combines multi-agent RL with our proposed iterative sequential action selection algorithm for load shaping in urban energy systems. This approach uses a reward function with individual and collective goals, and the agents predict their own future electricity consumption and share this information with each other following a leader-follower schema. The RL agents are tested in four groups of nine simulated buildings, with each group located in a different climate. The buildings have diverse load and domestic hot water profiles, PV panels, thermal storage devices, heat pumps, and electric heaters. The agents are evaluated on the average of five normalized metrics: annual net electric consumption, 1 -- load factor, average daily peak demand, annual peak demand, and ramping. MARLISA achieves superior results over multiple independent/uncooperative RL agents using the same reward function. Our results outperformed a manually optimized rule-based controller (RBC) benchmark by reducing the average daily peak load by 15%, ramping by 35%, and increasing the load factor by 10%. A multi-year case study on real weather data shows that MARLISA significantly outperforms the RBC in within a year and converges in less than 2 years. Combining MARLISA and the RBC for the first year improves overall initial performance by learning from the RBC rather than random exploration.

[1]  José R. Vázquez-Canteli,et al.  Fusing TensorFlow with building energy simulation for intelligent energy management in smart cities , 2019, Sustainable Cities and Society.

[2]  Lieve Helsen,et al.  Comparison of load shifting incentives for low-energy buildings with heat pumps to attain grid flexibility benefits , 2016 .

[3]  José R. Vázquez-Canteli,et al.  CityLearn v1.0: An OpenAI Gym Environment for Demand Response with Deep Reinforcement Learning , 2019, BuildSys@SenSys.

[4]  José R. Vázquez-Canteli,et al.  Balancing comfort and energy consumption of a heat pump using batch reinforcement learning with fitted Q-iteration , 2017 .

[5]  Kelum A. A. Gamage,et al.  Demand side management in smart grid: A review and proposals for future direction , 2014 .

[6]  José R. Vázquez-Canteli,et al.  Reinforcement learning for demand response: A review of algorithms and modeling techniques , 2019, Applied Energy.

[7]  Ali Reza Seifi,et al.  Multiagent Reinforcement Learning for Energy Management in Residential Buildings , 2021, IEEE Transactions on Industrial Informatics.

[8]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[9]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[10]  Stephen Treado,et al.  A review of multi-agent systems concepts and research related to building HVAC control , 2016 .

[11]  Johan A. K. Suykens,et al.  Multi-agent reinforcement learning for modeling and control of thermostatically controlled loads , 2019, Applied Energy.

[12]  Ville Könönen,et al.  Asymmetric multiagent reinforcement learning , 2003, Web Intell. Agent Syst..

[13]  Michael C. Baechler,et al.  Building America Best Practices Series: Volume 7.1: Guide to Determining Climate Regions by County , 2010 .

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  José R. Vázquez-Canteli,et al.  Optimal decarbonization pathways for urban residential building energy services , 2018, Applied Energy.