论文信息 - Multiagent-Based Reinforcement Learning for Optimal Reactive Power Dispatch

Multiagent-Based Reinforcement Learning for Optimal Reactive Power Dispatch

This paper proposes a fully distributed multiagent-based reinforcement learning method for optimal reactive power dispatch. According to the method, two agents communicate with each other only if their corresponding buses are electrically coupled. The global rewards that are required for learning are obtained with a consensus-based global information discovery algorithm, which has been demonstrated to be efficient and reliable. Based on the discovered global rewards, a distributed Q-learning algorithm is implemented to minimize the active power loss while satisfying operational constraints. The proposed method does not require accurate system model and can learn from scratch. Simulation studies with power systems of different sizes show that the method is very computationally efficient and able to provide near-optimal solutions. It can be observed that prior knowledge can significantly speed up the learning process and decrease the occurrences of undesirable disturbances. The proposed method has good potential for online implementation.

[1] Wenxin Liu,et al. Novel Multiagent Based Load Restoration Algorithm for Microgrids , 2011, IEEE Transactions on Smart Grid.

[2] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[3] O. Alsac,et al. Review Of Linear Programming Applied To Power System Rescheduling , 1979 .

[4] Warren B. Powell,et al. Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[5] Brahim Chaib-draa,et al. A Q-decomposition and bounded RTDP approach to resource allocation , 2007, AAMAS '07.

[6] Stuart J. Russell,et al. Q-Decomposition for Reinforcement Learning Agents , 2003, ICML.

[7] G. Contaxis,et al. Transformer tap setting observability in state estimation , 2004, IEEE Transactions on Power Systems.

[8] A. Monticelli,et al. Adaptive movement penalty method for the Newton optimal power , 1991, IEEE Power Engineering Review.

[9] Reza Olfati-Saber,et al. Consensus and Cooperation in Networked Multi-Agent Systems , 2007, Proceedings of the IEEE.

[10] T. Logenthiran,et al. Multi-agent coordination for DER in MicroGrid , 2008, 2008 IEEE International Conference on Sustainable Energy Technologies.

[11] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[12] Wenxin Liu,et al. Stable Multi-Agent-Based Load Shedding Algorithm for Power Systems , 2011, IEEE Transactions on Power Systems.

[13] N.N. Schulz,et al. Using intelligent multi-agent systems for shipboard power systems reconfiguration , 2005, Proceedings of the 13th International Conference on, Intelligent Systems Application to Power Systems.

[14] Kai Huang,et al. A Multiagent-Based Algorithm for Ring-Structured Shipboard Power System Reconfiguration , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[15] Yang Liu,et al. A new Q-learning algorithm based on the metropolis criterion , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[16] B. Zhao,et al. Improved particle swam optimization algorithm for OPF problems , 2004, IEEE PES Power Systems Conference and Exposition, 2004..

[17] Paul J. Werbos,et al. The roots of backpropagation , 1994 .

[18] Guillaume J. Laurent,et al. Coordination of independent learners in cooperative Markov games. , 2009 .

[19] Alicia Troncoso Lora,et al. Finding improved local minima of power system optimization problems by interior-point methods , 2002 .

[20] H. JoséAntonioMartín,et al. A distributed reinforcement learning control architecture for multi-link robots - experimental validation , 2007, ICINCO-ICSO.

[21] M. El-Hawary,et al. Hybrid Particle Swarm Optimization Approach for Solving the Discrete OPF Problem Considering the Valve Loading Effects , 2007, IEEE Transactions on Power Systems.

[22] N.D. Hatziargyriou,et al. Ant colony system-based algorithm for constrained load flow problem , 2005, IEEE Transactions on Power Systems.

[23] Marco P. Schoen,et al. Intelligent optimization techniques, genetic algorithms, tabu search, simulated annealing, and neural networks, D. T. Pham and D. Karaboga, Springer: Berlin, Heidelberg, New York; Springer London: London, 2000, 302pp, ISBN 1‐85233‐028‐7 , 2005 .

[24] Long Wang,et al. Fast information sharing in networks of autonomous agents , 2008, 2008 American Control Conference.

[25] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[26] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[27] Guillaume J. Laurent,et al. Hysteretic q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28] N.D. Hatziargyriou,et al. Reinforcement learning for reactive power control , 2004, IEEE Transactions on Power Systems.

[29] Martin Lauer,et al. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[30] Javier de Lope,et al. A distributed reinforcement learning control architecture for multi-link robots - experimental validation. , 2007 .

[31] A. Gomez-Exposito,et al. Finding Improved Local Minima of Power System Optimization Problems by Interior-Point Methods , 2002, IEEE Power Engineering Review.