A novel multi-agent decentralized win or learn fast policy hill-climbing with eligibility trace algorithm for smart generation control of interconnected complex power grids☆

Abstract This paper proposes a multi-agent smart generation control scheme for the automatic generation control coordination in interconnected complex power systems. A novel multi-agent decentralized win or learn fast policy hill-climbing with eligibility trace algorithm is developed, which can effectively identify the optimal average policies via a variable learning rate under various operation conditions. Based on control performance standards, the proposed approach is implemented in a flexible multi-agent stochastic dynamic game-based smart generation control simulation platform. Based on the mixed strategy and average policy, it is highly adaptive in stochastic non-Markov environments and large time-delay systems, which can fulfill automatic generation control coordination in interconnected complex power systems in the presence of increasing penetration of decentralized renewable energy. Two case studies on both a two-area load–frequency control power system and the China Southern Power Grid model have been done. Simulation results verify that multi-agent smart generation control scheme based on the proposed approach can obtain optimal average policies thus improve the closed-loop system performances, and can achieve a fast convergence rate with significant robustness compared with other methods.

[1]  Bezalel Peleg,et al.  Correlated equilibria of games with many players , 2000, Int. J. Game Theory.

[2]  O. J. Vrieze,et al.  Stochastic Games with Finite State and Action Spaces. , 1988 .

[3]  Hemanshu R. Pota,et al.  Distributed multi-agent scheme for reactive power management with renewable energy , 2014 .

[4]  Agostino Poggi,et al.  JADE - A Java Agent Development Framework , 2005, Multi-Agent Programming.

[5]  Hassan Bevrani,et al.  Application of GA optimization for automatic generation control design in an interconnected power system , 2011 .

[6]  A. R. Oneal,et al.  A simple method for improving control area performance: area control error (ACE) diversity interchange ADI , 1995 .

[7]  K. W. Chan,et al.  Multi-Agent Correlated Equilibrium Q(λ) Learning for Coordinated Smart Generation Control of Interconnected Power Grids , 2015, IEEE Transactions on Power Systems.

[8]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[9]  Y. V. Makarov,et al.  Possible improvements of the ACE diversity interchange methodology , 2010, IEEE PES General Meeting.

[10]  S. Granville,et al.  The Stackelberg equilibrium applied to AC power systems-a noninterior point algorithm , 2003 .

[11]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[12]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[13]  Mohamed Zribi,et al.  Adaptive decentralized load frequency control of multi-area power systems , 2005 .

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Stephen J. Wright,et al.  Distributed MPC Strategies With Application to Power System Automatic Generation Control , 2008, IEEE Transactions on Control Systems Technology.

[16]  Takashi Hiyama,et al.  A New Intelligent Agent-Based AGC Design With Real-Time Application , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[17]  Jing Peng,et al.  Incremental multi-step Q-learning , 1994, Machine Learning.

[18]  J. Nash NON-COOPERATIVE GAMES , 1951, Classics in Game Theory.

[19]  Hassan Bevrani,et al.  Load–frequency control : a GA-based multi-agent reinforcement learning , 2010 .

[20]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[21]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[22]  Tao Yu,et al.  Stochastic Optimal Relaxed Automatic Generation Control in Non-Markov Environment Based on Multi-Step $Q(\lambda)$ Learning , 2011, IEEE Transactions on Power Systems.

[23]  Michael H. Bowling,et al.  Finding Optimal Abstract Strategies in Extensive-Form Games , 2012, AAAI.

[24]  Elham Sadat Mostafavi,et al.  A novel machine learning approach for estimation of electricity demand: An empirical evidence from Thailand , 2013 .

[25]  Tao Yu,et al.  R(λ) imitation learning for automatic generation control of interconnected power grids , 2012, Autom..

[26]  M. Yao,et al.  AGC logic based on NERC's new Control Performance Standard and Disturbance Control Standard , 2000 .

[27]  J. Morgan,et al.  Slightly Altruistic Equilibria , 2008 .

[28]  Christopher Archibald,et al.  Baseline: practical control variates for agent evaluation in zero-sum domains , 2013, AAMAS.

[29]  Csaba Szepesvári,et al.  A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[30]  Ville Könönen,et al.  Dynamic pricing based on asymmetric multiagent reinforcement learning , 2006, Int. J. Intell. Syst..

[31]  Nand Kishor,et al.  A literature survey on load–frequency control for conventional and distribution generation power systems , 2013 .

[32]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[33]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[34]  N. Jaleeli,et al.  NERC's new control performance standards , 1999 .

[35]  Jürgen Dix,et al.  Multi-Agent Programming: Languages, Tools and Applications , 2009 .