论文信息 - Multi-objective fitted Q-iteration: Pareto frontier approximation in one single run

Multi-objective fitted Q-iteration: Pareto frontier approximation in one single run

We present a novel batch-mode Reinforcement Learning approach for the design of optimal controllers in the presence of multiple objectives. The algorithm is an extension of Fitted Q-iteration (FQI) that enables to design the controller for all the linear combinations of preferences (weights) assigned to the objectives in a single run. The key idea of multi-objective FQI (MOFQI) is to enlarge the continuos approximation of the value function, which is performed by single-objective FQI over the state-control space, also to the weight space. The bacth-mode nature of the algorithm makes it possible the enrichment of the learning data with nearly no additional computational cost with respect to a single-objective formulation on the same system. The approach was tested on a simple test case study concerning the optimal operation of a two-objective water reservoir, where MOFQI algorithm proved to be computationally preferable over repeatedly running FQI for different weight values when more than five points on the Pareto frontier are considered.

[1] Geoffrey J. Gordon. Online Fitted Reinforcement Learning , 1995 .

[2] Csaba Szepesvári,et al. Multi-criteria Reinforcement Learning , 1998, ICML.

[3] M. Hansen,et al. Evaluating the quality of approximations to the non-dominated set , 1998 .

[4] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .

[5] Shie Mannor,et al. The Steering Approach for Multi-Criteria Reinforcement Learning , 2001, NIPS.

[6] Andrea Castelletti,et al. Reinforcement learning in the operational management of a water system , 2002 .

[7] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[8] Shie Mannor,et al. A Geometric Approach to Multi-Criterion Reinforcement Learning , 2004, J. Mach. Learn. Res..

[9] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[10] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[11] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[12] Sriraam Natarajan,et al. Dynamic preferences in multi-criteria reinforcement learning , 2005, ICML.

[13] Pierre Geurts,et al. Extremely randomized trees , 2006, Machine Learning.

[14] Warren B. Powell,et al. Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[15] S. Timmer,et al. Fitted Q Iteration with CMACs , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[16] Srini Narayanan,et al. Learning all optimal policies with multiple criteria , 2008, ICML '08.

[17] Joelle Pineau,et al. Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning , 2008, AAAI.

[18] Jan Peters,et al. Fitted Q-iteration by Advantage Weighted Regression , 2008, NIPS.

[19] Andrea Bonarini,et al. Batch Reinforcement Learning for Controlling a Mobile Wheeled Pendulum Robot , 2008, IFIP AI.

[20] Eckart Zitzler,et al. Objective Reduction in Evolutionary Multiobjective Optimization: Theory and Applications , 2009, Evolutionary Computation.

[21] Susan A. Murphy,et al. Efficient Reinforcement Learning with Multiple Reward Functions for Randomized Controlled Trial Analysis , 2010, ICML.

[22] Marcello Restelli,et al. Tree‐based reinforcement learning for optimal water reservoir operation , 2010 .