Multi-objective safe reinforcement learning

Reinforcement learning (RL) is a learning method that learns actions based on trial and error. Recently, multi-objective reinforcement learning (MORL) and safe reinforcement learning (SafeRL) have been studied. The objective of conventional RL is to maximize the expected rewards; however, this may cause a fatal state because safety is not considered. Therefore, RL methods that consider safety during or after learning have been proposed. SafeRL is similar to MORL because it considers two objectives, i.e., maximizing expected rewards and satisfying safety constraints. However, to the best of our knowledge, no study has investigated the relationship between MORL and SafeRL to demonstrate that the SafeRL method can be applied to MORL tasks. This paper combines MORL with SafeRL and proposes a method for multi-objective safe RL (MOSafeRL). We applied the proposed method to Resource Gathering task, which is a standard task used in MORL test cases.

[1]  Michèle Sebag,et al.  Hypervolume indicator and dominance reward based multi-objective Monte-Carlo Tree Search , 2013, Machine Learning.

[2]  Ralph Neuneier,et al.  Risk-Sensitive Reinforcement Learning , 1998, Machine Learning.

[3]  Anne Auger,et al.  Theory of the hypervolume indicator: optimal μ-distributions and the choice of the reference point , 2009, FOGA '09.

[4]  Ann Nowé,et al.  Scalarized multi-objective reinforcement learning: Novel design techniques , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[5]  Silja Meyer-Nieberg,et al.  Evolving Artificial Neural Networks for Multi-objective Tasks , 2018, EvoApplications.

[6]  Makoto Sato,et al.  TD algorithm for the variance of return and mean-variance reinforcement learning , 2001 .

[7]  Vivek S. Borkar,et al.  A Learning Algorithm for Risk-Sensitive Cost , 2008, Math. Oper. Res..

[8]  Srini Narayanan,et al.  Learning all optimal policies with multiple criteria , 2008, ICML '08.

[9]  Ann Nowé,et al.  Hypervolume-Based Multi-Objective Reinforcement Learning , 2013, EMO.

[10]  Lothar Thiele,et al.  Multiobjective Optimization Using Evolutionary Algorithms - A Comparative Case Study , 1998, PPSN.

[11]  Marco Wiering,et al.  Model-based multi-objective reinforcement learning , 2014, 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[12]  Vivek S. Borkar,et al.  Q-Learning for Risk-Sensitive Control , 2002, Math. Oper. Res..

[13]  Evan Dekker,et al.  Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.

[14]  Sean P. Meyn,et al.  Risk-Sensitive Optimal Control for Markov Decision Processes with Monotone Cost , 2002, Math. Oper. Res..