Bi-level Off-policy Reinforcement Learning for Volt/VAR Control Involving Continuous and Discrete Devices

In Volt/Var control (VVC) of active distribution networks(ADNs), both slow timescale discrete devices (STDDs) and fast timescale continuous devices (FTCDs) are involved. The STDDs such as on-load tap changers (OLTC) and FTCDs such as distributed generators should be coordinated in time sequence. Such VCC is formulated as a two-timescale optimization problem to jointly optimize FTCDs and STDDs in ADNs. Traditional optimization methods are heavily based on accurate models of the system, but sometimes impractical because of their unaffordable effort on modelling. In this paper, a novel bi-level off-policy reinforcement learning (RL) algorithm is proposed to solve this problem in a model-free manner. A Bi-level Markov decision process (BMDP) is defined to describe the two-timescale VVC problem and separate agents are set up for the slow and fast timescale sub-problems. For the fast timescale sub-problem, we adopt an off-policy RL method soft actor-critic with high sample efficiency. For the slow one, we develop an off-policy multidiscrete soft actor-critic (MDSAC) algorithm to address the curse of dimensionality with various STDDs. To mitigate the nonstationary issue existing the two agents’ learning processes, we propose a multi-timescale off-policy correction (MTOPC) method by adopting importance sampling technique. Comprehensive numerical studies not only demonstrate that the proposed method can achieve stable and satisfactory optimization of both STDDs and FTCDs without any model information, but also support that the proposed method outperforms existing two-timescale VVC methods.

[1]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[2]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[3]  John E. Fletcher,et al.  Multi-Timescale Voltage Stability-Constrained Volt/VAR Optimization With Battery Storage System in Distribution Grids , 2020, IEEE Transactions on Sustainable Energy.

[4]  Evangelos Vrettos,et al.  A Reinforcement Learning Approach for Fast Frequency Control in Low-Inertia Power Systems , 2020, 2020 52nd North American Power Symposium (NAPS).

[5]  Haotian Liu,et al.  Online Multi-agent Reinforcement Learning for Decentralized Inverter-based Volt-VAR Control , 2020, ArXiv.

[6]  T. Kurbatova,et al.  Global trends in renewable energy development , 2020, 2020 IEEE KhPI Week on Advanced Technology (KhPIWeek).

[7]  Hsiao-Dong Chiang,et al.  Two-Timescale Multi-Objective Coordinated Volt/Var Optimization for Active Distribution Networks , 2019, IEEE Transactions on Power Systems.

[8]  Ratnesh Sharma,et al.  Coordination of PV Smart Inverters Using Deep Reinforcement Learning for Grid Voltage Regulation , 2019, 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA).

[9]  Wei Wang,et al.  Safe Off-Policy Deep Reinforcement Learning Algorithm for Volt-VAR Control in Power Distribution Systems , 2020, IEEE Transactions on Smart Grid.

[10]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[11]  Ioannis P. Vlahavas,et al.  Deep Reinforcement Learning: A State-of-the-Art Walkthrough , 2020, J. Artif. Intell. Res..

[12]  Georgios B. Giannakis,et al.  Two-Timescale Voltage Control in Distribution Grids Using Deep Reinforcement Learning , 2019, IEEE Transactions on Smart Grid.

[13]  Wenchuan Wu,et al.  Accelerated ADMM-Based Fully Distributed Inverter-Based Volt/Var Control Strategy for Active Distribution Networks , 2020, IEEE Transactions on Industrial Informatics.

[14]  Boming Zhang,et al.  Robust Reactive Power Optimization and Voltage Control Method for Active Distribution Networks via Dual Time-scale Coordination , 2016, ArXiv.

[15]  Qi Huang,et al.  A Multi-Agent Deep Reinforcement Learning Based Voltage Regulation Using Coordinated PV Inverters , 2020, IEEE Transactions on Power Systems.

[16]  Wei Shi,et al.  Distributed Voltage Control in Distribution Networks: Online and Robust Implementations , 2018, IEEE Transactions on Smart Grid.

[17]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[18]  Jie Shi,et al.  Batch-Constrained Reinforcement Learning for Dynamic Distribution Network Reconfiguration , 2020, IEEE Transactions on Smart Grid.

[19]  Guannan Qu,et al.  Reinforcement Learning for Decision-Making and Control in Power Systems: Tutorial, Review, and Vision , 2021, ArXiv.

[20]  Sergey Levine,et al.  Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.

[21]  C. Cañizares,et al.  Reactive Power and Voltage Control in Distribution Systems With Limited Switching Operations , 2009, IEEE Transactions on Power Systems.

[22]  Haotian Liu,et al.  Two-Stage Deep Reinforcement Learning for Inverter-Based Volt-VAR Control in Active Distribution Networks , 2020, IEEE Transactions on Smart Grid.

[23]  Chen-Ching Liu,et al.  Bi-Level Volt-VAR Optimization to Coordinate Smart Inverters With Voltage Control Devices , 2019, IEEE Transactions on Power Systems.

[24]  Felix F. Wu,et al.  Network Reconfiguration in Distribution Systems for Loss Reduction and Load Balancing , 1989, IEEE Power Engineering Review.

[25]  Moritz Diehl,et al.  CasADi: a software framework for nonlinear optimization and optimal control , 2018, Mathematical Programming Computation.

[26]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[27]  Kenneth O. Stanley,et al.  First return then explore , 2021, Nature.

[28]  David M. Auslander,et al.  Model-Free Optimal Control of VAR Resources in Distribution Systems: An Extremum Seeking Approach , 2016, IEEE Transactions on Power Systems.

[29]  David J. Hill,et al.  Multi-Timescale Coordinated Voltage/Var Control of High Renewable-Penetrated Distribution Systems , 2017, IEEE Transactions on Power Systems.

[30]  Alberto Borghetti,et al.  Using mixed integer programming for the volt/var optimization in distribution feeders , 2013 .

[31]  Xu Zhang,et al.  Research on AGC Performance During Wind Power Ramping Based on Deep Reinforcement Learning , 2020, IEEE Access.

[32]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[33]  Hao Jan Liu,et al.  Fast Local Voltage Control Under Limited Reactive Power: Optimality and Stability Analysis , 2015, IEEE Transactions on Power Systems.

[34]  Miltiadis D. Lytras,et al.  Artificial Intelligence for Smart Renewable Energy Sector in Europe—Smart Energy Infrastructures for Next Generation Smart Cities , 2020, IEEE Access.

[35]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.