Reinforcement Learning for Constrained Energy Trading Games With Incomplete Information

This paper considers the problem of designing adaptive learning algorithms to seek the Nash equilibrium (NE) of the constrained energy trading game among individually strategic players with incomplete information. In this game, each player uses the learning automaton scheme to generate the action probability distribution based on his/her private information for maximizing his own averaged utility. It is shown that if one of admissible mixed-strategies converges to the NE with probability one, then the averaged utility and trading quantity almost surely converge to their expected ones, respectively. For the given discontinuous pricing function, the utility function has already been proved to be upper semicontinuous and payoff secure which guarantee the existence of the mixed-strategy NE. By the strict diagonal concavity of the regularized Lagrange function, the uniqueness of NE is also guaranteed. Finally, an adaptive learning algorithm is provided to generate the strategy probability distribution for seeking the mixed-strategy NE.

[1]  Kaddour Najim,et al.  Learning automata and stochastic optimization , 1997 .

[2]  Walid Saad,et al.  A Game-Theoretic Approach to Energy Trading in the Smart Grid , 2013, IEEE Transactions on Smart Grid.

[3]  J. Goodman Note on Existence and Uniqueness of Equilibrium Points for Concave N-Person Games , 1965 .

[4]  Kazushi Ikeda,et al.  A new criterion using information gain for action selection strategy in reinforcement learning , 2004, IEEE Transactions on Neural Networks.

[5]  Athanasios V. Vasilakos,et al.  Computation of an Equilibrium in Spectrum Markets for Cognitive Radio Networks , 2014, IEEE Transactions on Computers.

[6]  Guoqiang Hu,et al.  Pinning Synchronization of Directed Networks With Switching Topologies: A Multiple Lyapunov Functions Approach , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Walid Saad,et al.  Game-Theoretic Methods for the Smart Grid: An Overview of Microgrid Systems, Demand-Side Management, and Smart Grid Communications , 2012, IEEE Signal Processing Magazine.

[8]  Naixue Xiong,et al.  A game-theoretic method of fair resource allocation for cloud computing services , 2010, The Journal of Supercomputing.

[9]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[10]  Athanasios V. Vasilakos,et al.  Approximating Congestion + Dilation in Networks via "Quality of Routing" Games , 2012, IEEE Trans. Computers.

[11]  H. Robbins,et al.  A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications , 1985 .

[12]  Louis Wehenkel,et al.  Reinforcement Learning Versus Model Predictive Control: A Comparison on a Power System Problem , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[13]  Guanghui Wen,et al.  Containment of Higher-Order Multi-Leader Multi-Agent Systems: A Dynamic Output Approach , 2016, IEEE Transactions on Automatic Control.

[14]  Quanyan Zhu,et al.  Heterogeneous learning in zero-sum stochastic games with incomplete information , 2011, 49th IEEE Conference on Decision and Control (CDC).

[15]  Mihaela van der Schaar,et al.  Dynamic pricing for smart grid with reinforcement learning , 2014, 2014 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[16]  Moshe Zukerman,et al.  Distributed Energy Trading in Microgrids: A Game-Theoretic Model and Its Equilibrium Analysis , 2015, IEEE Transactions on Industrial Electronics.

[17]  Frank H. Page,et al.  Uniform payoff security and Nash equilibrium in compact games , 2007, J. Econ. Theory.

[18]  Derong Liu,et al.  Reinforcement-Learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints , 2015, IEEE Transactions on Cybernetics.

[19]  Athanasios V. Vasilakos,et al.  TRAC: Truthful auction for location-aware collaborative sensing in mobile crowdsourcing , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[20]  Athanasios V. Vasilakos,et al.  Game Dynamics and Cost of Learning in Heterogeneous 4G Networks , 2012, IEEE Journal on Selected Areas in Communications.

[21]  H. Farhangi,et al.  The path of the smart grid , 2010, IEEE Power and Energy Magazine.

[22]  Quanyan Zhu,et al.  Dependable Demand Response Management in the Smart Grid: A Stackelberg Game Approach , 2013, IEEE Transactions on Smart Grid.

[23]  Athanasios V. Vasilakos,et al.  A new approach to the design of reinforcement schemes for learning automata: Stochastic estimator learning algorithm , 1995, Neurocomputing.

[24]  P. Venkata Krishna,et al.  Learning automata as a utility for power management in smart grids , 2013, IEEE Communications Magazine.

[25]  Snehasis Mukhopadhyay,et al.  Decentralized Indirect Methods for Learning Automata Games , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[26]  Mihaela van der Schaar,et al.  Demand Side Management in Smart Grids Using a Repeated Game Framework , 2013, IEEE Journal on Selected Areas in Communications.

[27]  Kaddour Najim,et al.  Learning through reinforcement for N-person repeated constrained games , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[28]  Patrick D. McDaniel,et al.  Security and Privacy Challenges in the Smart Grid , 2009, IEEE Security & Privacy.

[29]  Guoqiang Hu,et al.  Frequency Regulation of Source-Grid-Load Systems: A Compound Control Strategy , 2016, IEEE Transactions on Industrial Informatics.

[30]  Walid Saad,et al.  A noncooperative game for double auction-based energy trading between PHEVs and distribution grids , 2011, 2011 IEEE International Conference on Smart Grid Communications (SmartGridComm).

[31]  Kaddour Najim,et al.  Learning Automata: Theory and Applications , 1994 .

[32]  Alan Scheller-Wolf,et al.  Design of a Multi–Unit Double Auction E–Market , 2002, Comput. Intell..

[33]  Alejandro Ribeiro,et al.  Learning in network games with incomplete information: asymptotic analysis and tractable implementation of rational behavior , 2013, IEEE Signal Processing Magazine.

[34]  Robert Lasseter,et al.  Smart Distribution: Coupled Microgrids , 2011, Proceedings of the IEEE.

[35]  Athanasios V. Vasilakos,et al.  A Framework for Truthful Online Auctions in Cloud Computing with Heterogeneous User Demands , 2016, IEEE Transactions on Computers.

[36]  Matthew Saffell,et al.  Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.

[37]  M. Thathachar,et al.  Networks of Learning Automata: Techniques for Online Stochastic Optimization , 2003 .