Deep transfer Q-learning with virtual leader-follower for supply-demand Stackelberg game of smart grid

This paper proposes a novel deep transfer Q-learning (DTQ) associated with a virtual leader-follower pattern for supply-demand Stackelberg game of smart grid. Each generator and load are regarded as an agent of a supplier and a demander, respectively, in which an economic dispatch (ED) and demand response (DR) can be simultaneously solved by DTQ. To maximize the total payoff of all the agents, a virtual leader-follower pattern is employed to achieve a reliable collaboration among the agents. Then, Q-learning with a cooperative swarm is adopted for the knowledge learning for each agent via appropriate explorations and exploitations in an unknown environment. Furthermore, the original extremely large-scale knowledge matrix can be efficiently decomposed into several simplified small-scale knowledge matrices through a binary state-action chain, while the continuous actions can be generated for continuous variables. Lastly, a deep belief network (DBN) is used for knowledge transfer, thus DTQ can effectively exploit the prior knowledge from source tasks so as to rapidly obtain an optimal solution of a new task. Case studies are carried out to evaluate the performance of DTQ for supply-demand Stackelberg game of smart grid on a 94-agent system and a practical Shenzhen power grid of southern China.

[1]  Rahmat-Allah Hooshmand,et al.  Emission, reserve and economic load dispatch problem with non-smooth and non-convex cost functions using the hybrid bacterial foraging-Nelder–Mead algorithm , 2012 .

[2]  James A. Momoh,et al.  Improved interior point method for OPF problems , 1999 .

[3]  Gabriela Hug,et al.  Consensus + Innovations Approach for Distributed Multiagent Coordination in a Microgrid , 2015, IEEE Transactions on Smart Grid.

[4]  Frank L. Lewis,et al.  Distributed Consensus-Based Economic Dispatch With Transmission Losses , 2014, IEEE Transactions on Power Systems.

[5]  Lei Zheng,et al.  A Distributed Demand Response Control Strategy Using Lyapunov Optimization , 2014, IEEE Transactions on Smart Grid.

[6]  Yitao Liu,et al.  Deep belief network based deterministic and probabilistic wind speed forecasting approach , 2016 .

[7]  Tao Yu,et al.  Stochastic Optimal Relaxed Automatic Generation Control in Non-Markov Environment Based on Multi-Step $Q(\lambda)$ Learning , 2011, IEEE Transactions on Power Systems.

[8]  Gregor Verbic,et al.  A Faithful Distributed Mechanism for Demand Response Aggregation , 2016, IEEE Transactions on Smart Grid.

[9]  Wenhao Huang,et al.  Deep Architecture for Traffic Flow Prediction: Deep Belief Networks With Multitask Learning , 2014, IEEE Transactions on Intelligent Transportation Systems.

[10]  Enrico Zio,et al.  Reinforcement learning for microgrid energy management , 2013 .

[11]  Vincent W. S. Wong,et al.  Tackling the Load Uncertainty Challenges for Energy Consumption Scheduling in Smart Grid , 2013, IEEE Transactions on Smart Grid.

[12]  Ayman M. Eldeib,et al.  Breast cancer classification using deep belief networks , 2016, Expert Syst. Appl..

[13]  Tao Yu,et al.  Robust collaborative consensus algorithm for decentralized economic dispatch with a practical communication network , 2016 .

[14]  Seung Ho Hong,et al.  A Real-Time Demand-Response Algorithm for Smart Grids: A Stackelberg Game Approach , 2016, IEEE Transactions on Smart Grid.

[15]  Tao Yu,et al.  R(λ) imitation learning for automatic generation control of interconnected power grids , 2012, Autom..

[16]  Jian-Xin Xu,et al.  Consensus based approach for economic dispatch problem in a smart grid , 2013, IECON 2013 - 39th Annual Conference of the IEEE Industrial Electronics Society.

[17]  Sanjoy Mandal,et al.  Economic load dispatch using krill herd algorithm , 2014 .

[18]  Mengmeng Yu,et al.  Supply–demand balancing for power management in smart grid: A Stackelberg game approach , 2016 .

[19]  D. Menniti,et al.  Purchase-Bidding Strategies of an Energy Coalition With Demand-Response Capabilities , 2009, IEEE Transactions on Power Systems.

[20]  P. K. Chattopadhyay,et al.  Hybrid Differential Evolution With Biogeography-Based Optimization for Solution of Economic Load Dispatch , 2010, IEEE Transactions on Power Systems.

[21]  Xin-She Yang,et al.  Economic dispatch using chaotic bat algorithm , 2016 .

[22]  Xuesong Wang,et al.  Multi-source transfer ELM-based Q learning , 2014, Neurocomputing.

[23]  Dapeng Oliver Wu,et al.  Why Deep Learning Works: A Manifold Disentanglement Perspective , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[24]  Taher Niknam,et al.  A new fuzzy adaptive particle swarm optimization for non-smooth economic dispatch , 2010 .

[25]  Hsueh-Hsien Chang,et al.  Genetic algorithms and non-intrusive energy management system based economic dispatch for cogeneration units , 2011 .

[26]  Chao-Lung Chiang,et al.  Improved genetic algorithm for power economic dispatch of units with valve-point effects and multiple fuels , 2005 .

[27]  Mohammad Reza Hesamzadeh,et al.  Short-run economic dispatch with mathematical modelling of the adjustment cost , 2014 .

[28]  A. Philpott,et al.  Optimizing demand-side bids in day-ahead electricity markets , 2006, IEEE Transactions on Power Systems.

[29]  Mo-Yuen Chow,et al.  Convergence Analysis of the Incremental Cost Consensus Algorithm Under Different Communication Network Topologies in a Smart Grid , 2012, IEEE Transactions on Power Systems.

[30]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[31]  Frank L. Lewis,et al.  A Distributed Auction-Based Algorithm for the Nonconvex Economic Dispatch Problem , 2014, IEEE Transactions on Industrial Informatics.

[32]  Gregor Verbic,et al.  A Fast Distributed Algorithm for Large-Scale Demand Response Aggregation , 2016, IEEE Transactions on Smart Grid.

[33]  Yuan Zou,et al.  Reinforcement learning-based real-time energy management for a hybrid tracked vehicle , 2016 .

[34]  Sadegh Sadeghi,et al.  Thermodynamic analysis and optimization of a geothermal Kalina cycle system using Artificial Bee Colony algorithm , 2016 .

[35]  Zhu Han,et al.  How Geo-Distributed Data Centers Do Demand Response: A Game-Theoretic Approach , 2016, IEEE Transactions on Smart Grid.

[36]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[37]  Javier Jaén Martínez,et al.  An efficient ant colony optimization strategy for the resolution of multi-class queries , 2016, Knowl. Based Syst..

[38]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[39]  Dinu Calin Secui,et al.  A modified Symbiotic Organisms Search algorithm for large scale economic dispatch problem with valve-point effects , 2016 .

[40]  Hamdi Abdi,et al.  Optimal pricing in time of use demand response by integrating with dynamic economic dispatch problem , 2016 .

[41]  Yin Xu,et al.  Strategic Bidding and Compensation Mechanism for a Load Aggregator With Direct Thermostat Control Capabilities , 2018, IEEE Transactions on Smart Grid.

[42]  Q. Henry Wu,et al.  Group Search Optimizer: An Optimization Algorithm Inspired by Animal Searching Behavior , 2009, IEEE Transactions on Evolutionary Computation.

[43]  Meng Joo Er,et al.  Online tuning of fuzzy inference systems using dynamic fuzzy Q-learning , 2004, IEEE Trans. Syst. Man Cybern. Part B.

[44]  Chaohua Dai,et al.  Seeker Optimization Algorithm for Optimal Reactive Power Dispatch , 2009, IEEE Transactions on Power Systems.

[45]  R. Faranda,et al.  Load Shedding: A New Proposal , 2007, IEEE Transactions on Power Systems.

[46]  Nasrudin Abd Rahim,et al.  Solving non-convex economic dispatch problem via backtracking search algorithm , 2014 .

[47]  D. Kirschen Demand-side view of electricity markets , 2003 .

[48]  Richard A. Buswell,et al.  The implications of heat electrification on national electrical supply-demand balance under published 2050 energy scenarios , 2016 .

[49]  Alfredo Vaccaro,et al.  Decentralized Economic Dispatch in Smart Grids by Self-Organizing Dynamic Agents , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[50]  N. Grudinin Reactive power optimization using successive quadratic programming method , 1998 .

[51]  Zwe-Lee Gaing,et al.  Particle swarm optimization to solving the economic dispatch considering the generator constraints , 2003 .

[52]  P. K. Chattopadhyay,et al.  Evolutionary programming techniques for economic load dispatch , 2003, IEEE Trans. Evol. Comput..

[53]  Xiaodong Wang,et al.  Distributed Real-Time Energy Scheduling in Smart Grid: Stochastic Model and Fast Optimization , 2013, IEEE Transactions on Smart Grid.

[54]  Wenxin Liu,et al.  Distributed Online Optimal Energy Management for Smart Grids , 2015, IEEE Transactions on Industrial Informatics.

[55]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[56]  Vincent W. S. Wong,et al.  Advanced Demand Side Management for the Future Smart Grid Using Mechanism Design , 2012, IEEE Transactions on Smart Grid.