Tree‐based reinforcement learning for optimal water reservoir operation

Although being one of the most popular and extensively studied approaches to design water reservoir operations, Stochastic Dynamic Programming is plagued by a dual curse that makes it unsuitable to cope with large water systems: the computational requirement grows exponentially with the number of state variables considered (curse of dimensionality) and an explicit model must be available to describe every system transition and the associated rewards/costs (curse of modeling). A variety of simplifications and approximations have been devised in the past, which, in many cases, make the resulting operating policies inefficient and of scarce relevance in practical contexts. In this paper, a reinforcement‐learning approach, called fitted Q‐iteration, is presented: it combines the principle of continuous approximation of the value functions with a process of learning off‐line from experience to design daily, cyclostationary operating policies. The continuous approximation, performed via tree‐based regression, makes it possible to mitigate the curse of dimensionality by adopting a very coarse discretization grid with respect to the dense grid required to design an equally performing policy via Stochastic Dynamic Programming. The learning experience, in the form of a data set generated combining historical observations and model simulations, allows us to overcome the curse of modeling. Lake Como water system (Italy) is used as study site to infer general guidelines on the appropriate setting for the algorithm parameters and to demonstrate the advantages of the approach in terms of accuracy and computational effectiveness compared to traditional Stochastic Dynamic Programming.

[1]  N. Buras,et al.  The dynamic programming approach to water‐resources development , 1961 .

[2]  R. Bellman,et al.  Polynomial approximation—a new computational technique in dynamic programming: Allocation processes , 1963 .

[3]  P. B. Coaker,et al.  Applied Dynamic Programming , 1964 .

[4]  R. Bellman Dynamic programming. , 1957, Science.

[5]  William S. Butcher,et al.  Optimization of the Operation of a Multiple‐Purpose Reservoir by Dynamic Programming , 1968 .

[6]  David G. Luenberger,et al.  Reducing the Memory Requirements of Dynamic Programming , 1968, Oper. Res..

[7]  V. T. Chow,et al.  Discrete Differential Dynamic Programing Approach to Water Resources Systems Optimization , 1971 .

[8]  David G. Luenberger,et al.  Technical Note - Cyclic Dynamic Programming: A Procedure for Problems with Fixed Delay , 1971, Oper. Res..

[9]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[10]  W. Trott,et al.  Optimization of Multiple Reservoir System , 1973 .

[11]  Yacov Y. Haimes,et al.  Hierarchical analyses of water resources systems : modeling and optimization of large-scale systems , 1977 .

[12]  A. Turgeon Optimal operation of multireservoir power systems with stochastic inflows , 1980 .

[13]  Richard M. Shane,et al.  TVA hydro scheduling model: theoretical aspects , 1981 .

[14]  André Turgeon A decomposition method for the long-term scheduling of reservoirs in series , 1981 .

[15]  S. Yakowitz Dynamic programming applications in water resources , 1982 .

[16]  R. Soncini-Sessa,et al.  A Min-Max Approach to Reservoir Management , 1984 .

[17]  William W.-G. Yeh,et al.  Reservoir Management and Operations Models: A State‐of‐the‐Art Review , 1985 .

[18]  Giorgio Guariso,et al.  Decision support systems for water management: The Lake Como case study☆ , 1985 .

[19]  Giorgio Guariso,et al.  The Management of Lake Como: A Multiobjective Analysis , 1986 .

[20]  P. Kitanidis,et al.  Gradient dynamic programming for stochastic optimal control of multidimensional water resources systems , 1988 .

[21]  Maarouf Saad,et al.  Application of Principal-Component Analysis to Long-Term Reservoir Management , 1988 .

[22]  J. Stedinger,et al.  Sampling stochastic dynamic programming applied to reservoir operation , 1990 .

[23]  Aris P. Georgakakos,et al.  Optimal Stochastic Operation of Salt River Project, Arizona , 1991 .

[24]  Carlo Piccardi,et al.  Stochastic dynamic programming for reservoir optimal control: Dense discretization and inflow correlation assumption made possible by parallel computing , 1991 .

[25]  R. Soncini-Sessa,et al.  ON THE INTEGRATION OF RISK-AVERSION AND AVERAGE-PERFORMANCE OPTIMIZATION IN RESERVOIR CONTROL , 1992 .

[26]  Maarouf Saad,et al.  Censored‐data correlation and principal component dynamic programming , 1992 .

[27]  Ying Li,et al.  Numerical Solution of Continuous-State Dynamic Programs Using Linear and Spline Interpolation , 1993, Oper. Res..

[28]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[29]  A. Turgeon,et al.  Learning disaggregation technique for the operation of long‐term hydroelectric power systems , 1994 .

[30]  Haralambos V. Vasiliadis,et al.  DEMAND – DRIVEN OPERATION OF RESERVOIRS USING UNCERTAINTY – BASED OPTIMAL OPERATING POLICIES , 1994 .

[31]  Sharon A. Johnson,et al.  The Value of Hydrologic Information in Stochastic Dynamic Programming Models of a Multireservoir System , 1995 .

[32]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[33]  Lyn C. Thomas,et al.  An aggregate stochastic dynamic programming model of multireservoir systems , 1997 .

[34]  Peter K. Kitanidis,et al.  Improved Dynamic Programming Methods for Optimal Control of Lumped-Parameter Stochastic Systems , 2001, Oper. Res..

[35]  Andrea Castelletti,et al.  Reinforcement learning in the operational management of a water system , 2002 .

[36]  Chris Gaskett,et al.  Q-Learning for Robot Control , 2002 .

[37]  Rosaleen J. Anderson Near optimal closed-loop control Application to electric power systems , 2003 .

[38]  Dimitri P. Solomatine,et al.  Neural networks and reinforcement learning in control of water systems , 2003 .

[39]  Damien Ernst,et al.  Near Optimal Closed-Loop Control. Application to Electric Power Systems , 2003 .

[40]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[41]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[42]  Daniele de Rigo,et al.  Neuro-Dynamic Programming for the efficient integrated water resources management , 2004 .

[43]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[44]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[45]  John W. Labadie,et al.  Optimal Operation of Multireservoir Systems: State-of-the-Art Review , 2004 .

[46]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[47]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[48]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[49]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[50]  D. Ernst Selecting concise sets of samples for a reinforcement learning agent , 2005 .

[51]  Peter C. Young,et al.  The data-based mechanistic approach to the modelling, forecasting and control of environmental systems , 2006, Annu. Rev. Control..

[52]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[53]  Liming Xiang,et al.  Kernel-Based Reinforcement Learning , 2006, ICIC.

[54]  Cristiano Cervellera,et al.  Optimization of a large-scale water reservoir network by stochastic dynamic programming with efficient state space discretization , 2006, Eur. J. Oper. Res..

[55]  Thomas Martinetz,et al.  Improving Optimality of Neural Rewards Regression for Data-Efficient Batch Near-Optimal Policy Identification , 2007, ICANN.

[56]  Peter Stone,et al.  Batch reinforcement learning in a complex domain , 2007, AAMAS '07.

[57]  Jin-Hee Lee,et al.  Stochastic optimization of multireservoir systems via reinforcement learning , 2007 .

[58]  Thomas E. Croley,et al.  Application of a distributed large basin runoff model in the Great Lakes basin , 2007 .

[59]  Daniele de Rigo,et al.  Neuro-dynamic programming for designing water reservoir network management policies , 2007 .

[60]  Enrico Weber,et al.  Integrated and Participatory Water Resources Management - Practice , 2007 .

[61]  Andrea Bonarini,et al.  Piecewise constant reinforcement learning for robotic applications , 2007, ICINCO-ICSO.

[62]  Andrea Castelletti,et al.  Coupling real-time control and socio-economic issues in participatory river basin planning , 2007, Environ. Model. Softw..

[63]  Andrea Castelletti,et al.  Integrated and Participatory Water Resources Management. Theory , 2007 .

[64]  Joelle Pineau,et al.  Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning , 2008, AAAI.

[65]  Ximing Cai,et al.  The role of hydrologic information in reservoir operation – Learning from historical releases , 2008 .

[66]  Andrea Castelletti,et al.  Water reservoir control under economic, social and environmental constraints , 2008, Autom..

[67]  Francesca Pianosi,et al.  Real‐time management of a multipurpose water reservoir with a heteroscedastic inflow model , 2009 .

[68]  Andrea Castelletti,et al.  An emulation modelling approach to reduce the complexity of a 3D hydrodynamic-ecological model of a reservoir , 2009 .

[69]  Stefano Galelli,et al.  Building a metamodel of an irrigation district distributed-parameter model , 2010 .

[70]  Rodolfo Soncini-Sessa,et al.  Combining metamodelling and stochastic dynamic programming for the design of reservoir release policies , 2010, Environ. Model. Softw..

[71]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..