Tree-based fitted Q-iteration for multi-objective Markov decision processes in water resource management

Multi-objective Markov decision processes (MOMDPs) provide an effective modeling framework for decision-making problems involving water systems. The traditional approach is to define many single-objective problems (resulting from different combinations of the objectives), each solvable by standard optimization. This paper presents an approach based on reinforcement learning (RL) that can learn the operating policies for all combinations of objectives in a single training process. The key idea is to enlarge the approximation of the action-value function, which is performed by single-objective RL over the state-action space, to the space of the objectives9 weights. The batch-mode nature of the algorithm allows for enriching the training dataset without further interaction with the controlled system. The approach is demonstrated on a numerical test case study and evaluated on a real-world application, the Hoa Binh reservoir, Vietnam. Experimental results on the test case show that the proposed approach (multi-objective fitted Q-iteration; MOFQI) becomes computationally preferable over the repeated application of its single-objective version (fitted Q-iteration; FQI) when evaluating more than five weight combinations. In the Hoa Binh case study, the operating policies computed with MOFQI and FQI have comparable efficiency, while MOFQI provides a continuous approximation of the Pareto frontier with no additional computing costs.

[1]  R. Bellman Dynamic programming. , 1957, Science.

[2]  D. White Multi-objective infinite-horizon discounted Markov decision processes , 1982 .

[3]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[4]  Dapei Wang,et al.  Optimization of Real‐Time Reservoir Operations With Markov Decision Processes , 1986 .

[5]  D. J. White,et al.  Further Real Applications of Markov Decision Processes , 1988 .

[6]  K. Hipel Multiple objective decision making in water resources , 1992 .

[7]  Csaba Szepesvári,et al.  Multi-criteria Reinforcement Learning , 1998, ICML.

[8]  M. Hansen,et al.  Evaluating the quality of approximations to the non-dominated set , 1998 .

[9]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[10]  Konkoly Thege Multi-criteria Reinforcement Learning , 1998 .

[11]  Geoffrey J. Gordon,et al.  Approximate solutions to markov decision processes , 1999 .

[12]  Abdeslem Boukhtouta,et al.  Water Reservoir Applications of Markov Decision Processes , 2002 .

[13]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[14]  Shie Mannor,et al.  A Geometric Approach to Multi-Criterion Reinforcement Learning , 2004, J. Mach. Learn. Res..

[15]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[16]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[17]  André Turgeon,et al.  Solving a stochastic reservoir management problem with multilag autocorrelated inflows , 2005 .

[18]  Sriraam Natarajan,et al.  Dynamic preferences in multi-criteria reinforcement learning , 2005, ICML.

[19]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[20]  Liming Xiang,et al.  Kernel-Based Reinforcement Learning , 2006, ICIC.

[21]  Thomas A. Henzinger,et al.  Markov Decision Processes with Multiple Objectives , 2006, STACS.

[22]  Jin-Hee Lee,et al.  Stochastic optimization of multireservoir systems via reinforcement learning , 2007 .

[23]  S. Timmer,et al.  Fitted Q Iteration with CMACs , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[24]  Srini Narayanan,et al.  Learning all optimal policies with multiple criteria , 2008, ICML '08.

[25]  Jan Peters,et al.  Fitted Q-iteration by Advantage Weighted Regression , 2008, NIPS.

[26]  Andrea Castelletti,et al.  Water reservoir control under economic, social and environmental constraints , 2008, Autom..

[27]  Andrea Bonarini,et al.  Batch Reinforcement Learning for Controlling a Mobile Wheeled Pendulum Robot , 2008, IFIP AI.

[28]  Andrei V. Kelarev,et al.  Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks , 2009, Australasian Conference on Artificial Intelligence.

[29]  Sergio M. Savaresi,et al.  Batch Reinforcement Learning - An Application to a Controllable Semi-active Suspension System , 2009, ICINCO-ICSO.

[30]  Susan A. Murphy,et al.  Efficient Reinforcement Learning with Multiple Reward Functions for Randomized Controlled Trial Analysis , 2010, ICML.

[31]  Marcello Restelli,et al.  Tree‐based reinforcement learning for optimal water reservoir operation , 2010 .

[32]  Evan Dekker,et al.  Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.

[33]  Patrice Perny,et al.  On Finding Compromise Solutions in Multiobjective Markov Decision Processes , 2010, ECAI.

[34]  Andrea Castelletti,et al.  Assessing water reservoirs management and development in Northern Vietnam , 2011 .

[35]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[36]  F. Pianosi,et al.  Assessing water resources management and development in Northern Vietnam , 2011 .