论文信息 - Tree-based Fitted Q-iteration for Multi-Objective Markov Decision problems - 字舞流文

Tree-based Fitted Q-iteration for Multi-Objective Markov Decision problems

This paper is about solving multi-objective control problems using a model-free batch-mode reinforcement-learning approach. Although many real-world applications have several conflicting objectives, reinforcement-learning (RL) literature has mainly focused on single-objective control problems. As a consequence, in the presence of multiple objectives, the usual approach is to consider many single-objective control problems (resulting from different combinations of the original problem objectives), each one solved using standard RL techniques. The algorithm proposed in this paper is an extension of Fitted Q-iteration (FQI) that enables to learn the control policies for all the linear combinations of preferences (weights) assigned to the objectives in a single training process. The key idea of multi-objective FQI (MOFQI) is to enlarge the continuous approximation of the action-value function, which is performed by single-objective FQI over the state-action space, also to the weight space. The approach is demonstrated on an interesting real-world application for multi-objective RL algorithms: the optimal operation of a multi-purpose water reservoir.

Andrea Castelletti | Marcello Restelli | Francesca Pianosi | Marcello Restelli | F. Pianosi | A. Castelletti

[1] Andrea Castelletti,et al. Assessing water reservoirs management and development in Northern Vietnam , 2011 .

[2] Marcello Restelli,et al. Tree‐based reinforcement learning for optimal water reservoir operation , 2010 .

[3] Shie Mannor,et al. A Geometric Approach to Multi-Criterion Reinforcement Learning , 2004, J. Mach. Learn. Res..

[4] Thomas A. Henzinger,et al. Markov Decision Processes with Multiple Objectives , 2006, STACS.

[5] Konkoly Thege. Multi-criteria Reinforcement Learning , 1998 .

[6] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..

[7] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[8] M. Hansen,et al. Evaluating the quality of approximations to the non-dominated set , 1998 .

[9] F. Pianosi,et al. Assessing water resources management and development in Northern Vietnam , 2011 .

[10] Pierre Geurts,et al. Extremely randomized trees , 2006, Machine Learning.

[11] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .

[12] Patrice Perny,et al. On Finding Compromise Solutions in Multiobjective Markov Decision Processes , 2010, ECAI.

[13] Srini Narayanan,et al. Learning all optimal policies with multiple criteria , 2008, ICML '08.

[14] Panos M. Pardalos,et al. Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..

[15] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[16] Andrei V. Kelarev,et al. Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks , 2009, Australasian Conference on Artificial Intelligence.

[17] Sriraam Natarajan,et al. Dynamic preferences in multi-criteria reinforcement learning , 2005, ICML.

[18] Evan Dekker,et al. Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.

[19] Susan A. Murphy,et al. Efficient Reinforcement Learning with Multiple Reward Functions for Randomized Controlled Trial Analysis , 2010, ICML.

[20] Warren B. Powell,et al. Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .