Distributed Q-Learning Algorithm for Dynamic Resource Allocation With Unknown Objective Functions and Application to Microgrid

Dynamic resource allocation problem (DRAP) with unknown cost functions and unknown resource transition functions is studied in this article. The goal of the agents is to minimize the sum of cost functions over given time periods in a distributed way, that is, by only exchanging information with their neighboring agents. First, we propose a distributed <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learning algorithm for DRAP with unknown cost functions and unknown resource transition functions under discrete local feasibility constraints (DLFCs). It is theoretically proved that the joint policy of agents produced by the distributed <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learning algorithm can always provide a feasible allocation (FA), that is, satisfying the constraints at each time period. Then, we also study the DRAP with unknown cost functions and unknown resource transition functions under continuous local feasibility constraints (CLFCs), where a novel distributed <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learning algorithm is proposed based on function approximation and distributed optimization. It should be noted that the update rule of the local policy of each agent can also ensure that the joint policy of agents is an FA at each time period. Such property is of vital importance to execute the <inline-formula> <tex-math notation="LaTeX">$\varepsilon $ </tex-math></inline-formula>-greedy policy during the whole training process. Finally, simulations are presented to demonstrate the effectiveness of the proposed algorithms.

[1]  Wei Xing Zheng,et al.  Distributed $Q$ -Learning-Based Online Optimization Algorithm for Unit Commitment and Dispatch in Smart Grid , 2020, IEEE Transactions on Cybernetics.

[2]  Guanghui Wen,et al.  Distributed Reinforcement Learning Algorithm for Dynamic Economic Dispatch With Unknown Generation Cost Functions , 2020, IEEE Transactions on Industrial Informatics.

[3]  Xinyi Le,et al.  A Distributed Optimization Algorithm Based on Multiagent Network for Economic Dispatch With Region Partitioning , 2019, IEEE Transactions on Cybernetics.

[4]  Xinghuo Yu,et al.  Consensus-Based Distributed Coordination Between Economic Dispatch and Demand Response , 2019, IEEE Transactions on Smart Grid.

[5]  Tingwen Huang,et al.  Distributed Power Management for Dynamic Economic Dispatch in the Multimicrogrids Environment , 2019, IEEE Transactions on Control Systems Technology.

[6]  Zheng Yan,et al.  A Neurodynamic Approach to Distributed Optimization With Globally Coupled Constraints , 2018, IEEE Transactions on Cybernetics.

[7]  Guanghui Wen,et al.  Adaptive Consensus-Based Robust Strategy for Economic Dispatch of Smart Grids Subject to Communication Uncertainties , 2018, IEEE Transactions on Industrial Informatics.

[8]  Xinghuo Yu,et al.  Distributed Optimal Consensus Over Resource Allocation Network and Its Application to Dynamical Economic Dispatch , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Guanghui Wen,et al.  Economic power dispatch in smart grids: a framework for distributed optimization and consensus dynamics , 2017, Science China Information Sciences.

[10]  Qingshan Liu,et al.  Distributed Optimization Based on a Multiagent System in the Presence of Communication Delays , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[11]  Wei Xing Zheng,et al.  Distributed $k$ -Means Algorithm and Fuzzy $c$ -Means Algorithm for Sensor Networks Based on Multiagent Consensus Theory , 2017, IEEE Transactions on Cybernetics.

[12]  Wei Shi,et al.  Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[13]  Yongduan Song,et al.  Distributed Economic Dispatch for Smart Grids With Random Wind Power , 2016, IEEE Transactions on Smart Grid.

[14]  Xinghuo Yu,et al.  Smart Grids: A Cyber–Physical Systems Perspective , 2016, Proceedings of the IEEE.

[15]  Tapabrata Ray,et al.  Evolutionary Algorithms for Dynamic Economic Dispatch Problems , 2016, IEEE Transactions on Power Systems.

[16]  Feng Liu,et al.  Initialization-free distributed algorithms for optimal resource allocation with feasibility constraints and application to economic dispatch of power systems , 2015, Autom..

[17]  Feng Liu,et al.  Distributed gradient algorithm for constrained optimization with application to load sharing in power systems , 2015, Syst. Control. Lett..

[18]  Marc G. Bellemare,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[19]  Sonia Martínez,et al.  Distributed convex optimization via continuous-time coordination algorithms with discrete-time communication , 2014, Autom..

[20]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[21]  Bahman Gharesifard,et al.  Distributed Continuous-Time Convex Optimization on Weight-Balanced Digraphs , 2012, IEEE Transactions on Automatic Control.

[22]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[23]  Daniela Pucci de Farias,et al.  Decentralized Resource Allocation in Dynamic Networks of Agents , 2008, SIAM J. Optim..

[24]  Asuman E. Ozdaglar,et al.  Constrained Consensus and Optimization in Multi-Agent Networks , 2008, IEEE Transactions on Automatic Control.

[25]  C.N. Hadjicostis,et al.  Finite-Time Distributed Consensus in Graphs with Time-Invariant Topologies , 2007, 2007 American Control Conference.

[26]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[27]  Toshihide Ibaraki,et al.  Resource allocation problems - algorithmic approaches , 1988, MIT Press series in the foundations of computing.

[28]  Sidney N. Givigi,et al.  A Q-Learning Approach to Flocking With UAVs in a Stochastic Environment , 2017, IEEE Transactions on Cybernetics.

[29]  A. Ozdaglar,et al.  Optimal Distributed Gradient Methods for Network Resource Allocation Problems , 2013 .

[30]  Allen J. Wood,et al.  Power Generation, Operation, and Control , 1984 .

[31]  Yu-Chi Ho,et al.  A Class of Center-Free Resource Allocation Algorithms 1 , 1980 .