A game theory-reinforcement learning (GT-RL) method to develop optimal operation policies for multi-operator reservoir systems

Summary Reservoir systems with multiple operators can benefit from coordination of operation policies. To maximize the total benefit of these systems the literature has normally used the social planner’s approach. Based on this approach operation decisions are optimized using a multi-objective optimization model with a compound system’s objective. While the utility of the system can be increased this way, fair allocation of benefits among the operators remains challenging for the social planner who has to assign controversial weights to the system’s beneficiaries and their objectives. Cooperative game theory provides an alternative framework for fair and efficient allocation of the incremental benefits of cooperation. To determine the fair and efficient utility shares of the beneficiaries, cooperative game theory solution methods consider the gains of each party in the status quo (non-cooperation) as well as what can be gained through the grand coalition (social planner’s solution or full cooperation) and partial coalitions. Nevertheless, estimation of the benefits of different coalitions can be challenging in complex multi-beneficiary systems. Reinforcement learning can be used to address this challenge and determine the gains of the beneficiaries for different levels of cooperation, i.e., non-cooperation, partial cooperation, and full cooperation, providing the essential input for allocation based on cooperative game theory. This paper develops a game theory–reinforcement learning (GT–RL) method for determining the optimal operation policies in multi-operator multi-reservoir systems with respect to fairness and efficiency criteria. As the first step to underline the utility of the GT–RL method in solving complex multi-agent multi-reservoir problems without a need for developing compound objectives and weight assignment, the proposed method is applied to a hypothetical three-agent three-reservoir system.

[1]  David W. Watkins,et al.  LINEAR PROGRAMMING FOR FLOOD CONTROL IN THE IOWA AND DES MOINES RIVERS , 2000 .

[2]  H. Young,et al.  Cost allocation in water resources development , 1982 .

[3]  Gu Yan-hong Two-Person Cooperative Games on Makespan Scheduling , 2011 .

[4]  Marcello Restelli,et al.  Tree‐based reinforcement learning for optimal water reservoir operation , 2010 .

[5]  K. Madani,et al.  Optimality versus stability in water resource allocation. , 2014, Journal of environmental management.

[6]  J. Harsanyi A bargaining model for the cooperative n-person game , 1958 .

[7]  Najmeh Mahjouri,et al.  Optimal Inter-Basin Water Allocation Using Crisp and Fuzzy Shapley Games , 2010 .

[8]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[9]  E. Arnold,et al.  Two methods for large-scale nonlinear optimization and their comparison on a case study of hydropower optimization , 1994 .

[10]  R. Willis,et al.  Monte Carlo Optimization for Reservoir Operation , 1984 .

[11]  Konstantin Staschus,et al.  Optimization of Value of CVP’s Hydropower Production , 1990 .

[12]  K. Madani,et al.  Bargaining over the Caspian Sea- the Largest Lake on the Earth , 2008 .

[13]  J. Nash Two-Person Cooperative Games , 1953 .

[14]  Ni-Bin Chang,et al.  Bringing Environmental Benefits into Caspian Sea Negotiations for Resources Allocation: Cooperative Game Theory Insights , 2012 .

[15]  John W. Labadie,et al.  Optimal Operation of Multireservoir Systems: State-of-the-Art Review , 2004 .

[16]  Lizhong Wang,et al.  Basin-wide cooperative water resources allocation , 2008, Eur. J. Oper. Res..

[17]  Jin-Hee Lee,et al.  Stochastic optimization of multireservoir systems via reinforcement learning , 2007 .

[18]  Kaveh Madani,et al.  Climate change impacts on high-elevation hydroelectricity in California , 2014 .

[19]  P. Straffin,et al.  Game theory and the tennessee valley authority , 1981 .

[20]  D. McKinney,et al.  Calculating the Benefits of Transboundary River Basin Cooperation: Syr Darya Basin , 2011 .

[21]  F. Szidarovszky,et al.  Multiobjective management of mining under water hazard by game theory , 1984 .

[22]  W. Uijttewaal,et al.  Flow resistance of vegetated oblique weir-like obstacles during high water stages , 2011 .

[23]  A. Dinar,et al.  Cooperative institutions for sustainable common pool resource management: Application to groundwater , 2012 .

[24]  Abhijit Gosavi,et al.  Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[25]  S. Morid,et al.  A new framework for resolving conflicts over transboundary rivers using bankruptcy methods , 2014 .

[26]  Kaveh Madani,et al.  Game theory and water resources , 2010 .

[27]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[28]  Tamio Shimizu,et al.  A Stochastic Approximation Method for Optimization Problems , 1969, Journal of the ACM.

[29]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[30]  S. Soliman,et al.  Optimal operation of multireservoir power systems , 1986 .

[31]  Masoud Mahootchi Storage System Management Using Reinforcement Learning Techniques and Nonlinear Models , 2009 .

[32]  Jr. A. Thomas,et al.  14. Mathematical Models: A Stochastic Sequential Approach , 1962 .

[33]  Ariel Dinar,et al.  Cooperative Institutions for Sustainable Management of Common Pool Resources , 2011 .

[34]  K. Ponnambalam,et al.  Opposition-Based Reinforcement Learning in the Management of Water Resources , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[35]  A. Dinar,et al.  Cooperative Game Theory and its Application to Natural, Environmental, and Water Resource Issues: 3. Application to Water Resources , 2006 .

[36]  M. Nakayama,et al.  The Cost Assignment of the Cooperative Water Resource Development: A Game Theoretical Approach , 1976 .

[37]  Dan Yaron,et al.  Regional Cooperation in the Use of Irrigation Water, Efficiency and Game Theory Analysis of Income Distribution , 1986 .

[38]  D. Gately Sharing the Gains from Regional Cooperation: A Game Theoretic Application to Planning Investment in Electric Power , 1974 .

[39]  Martin Shubik,et al.  A Method for Evaluating the Distribution of Power in a Committee System , 1954, American Political Science Review.

[40]  Ariel Dinar,et al.  Mechanisms for allocation of environmental control cost : Empirical tests of acceptability and stability , 1997 .

[41]  D. Schmeidler The Nucleolus of a Characteristic Function Game , 1969 .

[42]  Jacques-Eric BERGEZ,et al.  COMPARISON BETWEEN DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING: A CASE STUDY ON MAIZE IRRIGATION MANAGEMENT , 2001 .

[43]  L. S. Shapley,et al.  17. A Value for n-Person Games , 1953 .

[44]  A. Dinar SCALE AND EQUITY IN WATER RESOURCE DEVELOPMENT: A NASH BARGAINING MODEL , 2001 .

[45]  Daniel P. Loucks,et al.  An evaluation of some linear decision rules in chance‐Constrained models for reservoir planning and operation , 1975 .

[46]  Mohammad Karamouz,et al.  Computational improvement for dynamic programming models by diagnosing infeasible storage combinations , 2003 .

[47]  Laura Read,et al.  Voting Under Uncertainty: A Stochastic Framework for Analyzing Group Decision Making Problems , 2014, Water Resources Management.

[48]  K. Madani Hydropower licensing and climate change: Insights from cooperative game theory , 2011 .

[49]  Jay R. Lund,et al.  Operating Rule Optimization for Missouri River Reservoir System , 1996 .

[50]  Abbas Afshar,et al.  Cooperative Game Theoretic Framework for Joint Resource Management in Construction , 2014 .

[51]  Andrew Whinston,et al.  Cost allocation for a regional wastewater treatment system , 1979 .

[52]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[53]  J. Stedinger,et al.  Sampling stochastic dynamic programming applied to reservoir operation , 1990 .