Action Set Based Policy Optimization for Safe Power Grid Management

Maintaining the stability of the modern power grid is becoming increasingly difficult due to fluctuating power consumption, unstable power supply coming from renewable energies, and unpredictable accidents such as man-made and natural disasters. As the operation on the power grid must consider its impact on future stability, reinforcement learning (RL) has been employed to provide sequential decision-making in power grid management. However, existing methods have not considered the environmental constraints. As a result, the learned policy has risk of selecting actions that violate the constraints in emergencies, which will escalate the issue of overloaded power lines and lead to largescale blackouts. In this work, we propose a novel method for this problem, which builds on top of the search-based planning algorithm. At the planning stage, the search space is limited to the action set produced by the policy. The selected action strictly follows the constraints by testing its outcome with the simulation function provided by the system. At the learning stage, to address the problem that gradients cannot be propagated to the policy, we introduce Evolutionary Strategies (ES) with black-box policy optimization to improve the policy directly, maximizing the returns of the long run. In NeurIPS 2020 Learning to Run Power Network (L2RPN) competition, our solution safely managed the power grid and ranked first in both tracks.

[1]  Deunsol Yoon,et al.  Winning the L2RPN Challenge: Power Grid Management via Semi-Markov Afterstate Actor-Critic , 2021, ICLR.

[2]  Milind Tambe,et al.  Solving Online Threat Screening Games using Constrained Action Space Reinforcement Learning , 2020, AAAI.

[3]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[4]  Gustaf Olsson,et al.  Emergency voltage control using search and predictive control , 2002 .

[5]  Kevin Tomsovic,et al.  Designing the Next Generation of Real-Time Control, Communication, and Computations for Large Power Systems , 2005, Proceedings of the IEEE.

[6]  D. Ernst,et al.  Power systems stability control: reinforcement learning framework , 2004, IEEE Transactions on Power Systems.

[7]  Isabelle Guyon,et al.  Learning to run a power network challenge for training topology controllers , 2019, Electric Power Systems Research.

[8]  Nicola Elia,et al.  Model Predictive Control-Based Real-Time Power System Protection Schemes , 2010, IEEE Transactions on Power Systems.

[9]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[10]  G. Trudel,et al.  A 735 kV shunt reactors automatic switching system for Hydro-Quebec network , 1996 .

[11]  Manfred Morari,et al.  Model predictive control: Theory and practice - A survey , 1989, Autom..

[12]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[13]  David Budden,et al.  Distributed Prioritized Experience Replay , 2018, ICLR.

[14]  Zhiwei Wang,et al.  Autonomous Voltage Control for Grid Operation Using Deep Reinforcement Learning , 2019, 2019 IEEE Power & Energy Society General Meeting (PESGM).

[15]  M. Glavic,et al.  Distributed Undervoltage Load Shedding , 2007, IEEE Transactions on Power Systems.

[16]  Georgios B. Giannakis,et al.  Two-Timescale Voltage Control in Distribution Grids Using Deep Reinforcement Learning , 2019, IEEE Transactions on Smart Grid.

[17]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[18]  Shie Mannor,et al.  Hierarchical Decision Making In Electricity Grid Management , 2016, ICML.

[19]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[20]  Mohammad Shahidehpour,et al.  Transmission Switching in Security-Constrained Unit Commitment , 2010, IEEE Transactions on Power Systems.

[21]  Renke Huang,et al.  Adaptive Power System Emergency Control Using Deep Reinforcement Learning , 2019, IEEE Transactions on Smart Grid.

[22]  Lorenz Wellhausen,et al.  Learning quadrupedal locomotion over challenging terrain , 2020, Science Robotics.

[23]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[24]  Mario Schmidt,et al.  Einleitung , 2018, Zeitschrift für Kulturwissenschaften.

[25]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[26]  Gabriel Dulac-Arnold,et al.  L2RPN: Learning to Run a Power Network in a Sustainable World NeurIPS2020 challenge design , 2020 .

[27]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[28]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[29]  M. Ferris,et al.  Optimal Transmission Switching , 2008, IEEE Transactions on Power Systems.

[30]  G. Trudel,et al.  Hydro-Quebec's defence plan against extreme contingencies , 1999 .