Comparing Policy Gradient and Value Function Based Reinforcement Learning Methods in Simulated Electrical Power Trade

In electrical power engineering, reinforcement learning algorithms can be used to model the strategies of electricity market participants. However, traditional value function based reinforcement learning algorithms suffer from convergence issues when used with value function approximators. Function approximation is required in this domain to capture the characteristics of the complex and continuous multivariate problem space. The contribution of this paper is the comparison of policy gradient reinforcement learning methods, using artificial neural networks for policy function approximation, with traditional value function based methods in simulations of electricity trade. The methods are compared using an AC optimal power flow based power exchange auction market model and a reference electric power system model.

[1]  A. Roth,et al.  Learning in Extensive-Form Games: Experimental Data and Simple Dynamic Models in the Intermediate Term* , 1995 .

[2]  Richard S. Sutton,et al.  GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.

[3]  J. Moody,et al.  Performance functions and reinforcement learning for trading systems and portfolios , 1998 .

[4]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[5]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[6]  J. Bower,et al.  A model-based analysis of strategic consolidation in the German electricity industry , 2001 .

[7]  M. Ilic,et al.  Market dynamics driven by the decision-making of both power producers and transmission owners , 2004, IEEE Power Engineering Society General Meeting, 2004..

[8]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[9]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[10]  Leigh Tesfatsion,et al.  Market power and efficiency in a computational electricity market with discriminatory double-auction pricing , 2001, IEEE Trans. Evol. Comput..

[11]  Judy A. Franklin,et al.  Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  VengerovDavid A gradient-based reinforcement learning approach to dynamic pricing in partially-observable environments , 2008 .

[14]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[15]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[16]  Derek W. Bunn,et al.  Experimental analysis of the efficiency of uniform-price versus discriminatory auctions in the England and Wales electricity market ☆ , 2001 .

[17]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[18]  David Vengerov,et al.  A gradient-based reinforcement learning approach to dynamic pricing in partially-observable environments , 2008, Future Gener. Comput. Syst..

[19]  Derek W. Bunn,et al.  Evaluating Individual Market Power in Electricity Markets via Agent-Based Simulation , 2003, Ann. Oper. Res..

[20]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[22]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[23]  R. Sutton,et al.  GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .

[24]  Daniel J. Veit,et al.  Simulating the dynamics in two-settlement electricity markets via an agent-based approach , 2006 .

[25]  M. Ilić,et al.  Dynamic games-based modeling of electricity markets , 1999, IEEE Power Engineering Society. 1999 Winter Meeting (Cat. No.99CH36233).

[26]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[27]  Matthew Saffell,et al.  Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.

[28]  Laurene V. Fausett,et al.  Fundamentals Of Neural Networks , 1994 .

[29]  Mohammad Ali Rastegar,et al.  Agent-based model of the Italian wholesale electricity market , 2009, 2009 6th International Conference on the European Energy Market.

[30]  T. Veselka,et al.  Multi-agent power market simulation using EMCAS , 2005, IEEE Power Engineering Society General Meeting, 2005.

[31]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[32]  Derek W. Bunn,et al.  Crossholdings, concentration and information in capacity-constrained sealed bid-offer auctions , 2008 .

[33]  G. Andersson,et al.  Evaluating congestion management schemes in liberalized electricity markets using an agent-based simulator , 2006, 2006 IEEE Power Engineering Society General Meeting.

[34]  Damien Ernst,et al.  A comparison of Nash equilibria analysis and agent-based modelling for power markets , 2006 .

[35]  Mohammad Shahidehpour,et al.  The IEEE Reliability Test System-1996. A report prepared by the Reliability Test System Task Force of the Application of Probability Methods Subcommittee , 1999 .

[36]  Probability Subcommittee,et al.  IEEE Reliability Test System , 1979, IEEE Transactions on Power Apparatus and Systems.

[37]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[38]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[39]  A. Weidlich,et al.  Bidding In Interrelated Day-Ahead Electricity Markets: Insights From An Agent-Based Simulation Model , 2006 .

[40]  Vladimir Koritarov,et al.  An agent-based approach to modeling interactions between emission market and electricity market , 2009, 2009 IEEE Power & Energy Society General Meeting.

[41]  R D Zimmerman,et al.  MATPOWER: Steady-State Operations, Planning, and Analysis Tools for Power Systems Research and Education , 2011, IEEE Transactions on Power Systems.

[42]  Leonid Peshkin,et al.  Reinforcement learning for adaptive routing , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[43]  Jan Peters,et al.  Policy Gradient Methods , 2010, Encyclopedia of Machine Learning.

[44]  Kasimir Egli,et al.  Analysis of Strategic Behaviour in Combined Electricity and Gas Markets Using Agent-based Computational Economics , 2007 .