Q-Managed: A new algorithm for a multiobjective reinforcement learning

Abstract Multi-objective reinforcement learning involves the use of reinforcement learning techniques to address problems with multiple objectives. To resolve this, we use a hybrid multi-objective optimization method that provides the mathematical guarantee that all policies belonging to the Pareto Front can be found. The hybridization gave rise to Q-Managed, which is given by the ϵ − constraint method and the Q-Learning algorithm, where the first limits the environment dynamically based on the agent’s learning. Thus, when a region no longer provides improvement, it becomes a constraint, preventing the agent from returning. The simplicity and its performance come from a single-policy algorithms.

[1]  Shimon Whiteson,et al.  A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[2]  Patrice Perny,et al.  On Finding Compromise Solutions in Multiobjective Markov Decision Processes , 2010, ECAI.

[3]  Ying Han,et al.  A Q-learning-based memetic algorithm for multi-objective dynamic software project scheduling , 2018, Inf. Sci..

[4]  Ann Nowé,et al.  Scalarized multi-objective reinforcement learning: Novel design techniques , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[5]  D. White Multi-objective infinite-horizon discounted Markov decision processes , 1982 .

[6]  Kalyanmoy Deb,et al.  Multi-objective optimization using evolutionary algorithms , 2001, Wiley-Interscience series in systems and optimization.

[7]  Jorge Dantas de Melo,et al.  Q-Managed: A new algorithm for a multiobjective reinforcement learning , 2020, Expert Syst. Appl..

[8]  Hisao Ishibuchi,et al.  An easy-to-use real-world multi-objective optimization problem suite , 2020, Appl. Soft Comput..

[9]  Kaisa Miettinen,et al.  On scalarizing functions in multiobjective optimization , 2002, OR Spectr..

[10]  Peter Vamplew,et al.  Softmax exploration strategies for multiobjective reinforcement learning , 2017, Neurocomputing.

[11]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems , 2002, Genetic Algorithms and Evolutionary Computation.

[12]  Ann Nowé,et al.  Multi-objective reinforcement learning using sets of pareto dominating policies , 2014, J. Mach. Learn. Res..

[13]  Dewen Hu,et al.  Multiobjective Reinforcement Learning: A Comprehensive Overview , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Avi Ostfeld,et al.  Multi-objective optimization of water quality, pumps operation, and storage sizing of water distribution systems. , 2013, Journal of environmental management.

[16]  Thomas Bäck,et al.  Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms , 1996 .

[17]  Marco Wiering,et al.  Model-based multi-objective reinforcement learning , 2014, 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[18]  Ann Nowé,et al.  Hypervolume-Based Multi-Objective Reinforcement Learning , 2013, EMO.

[19]  Kalyanmoy Deb,et al.  Introduction to Evolutionary Multiobjective Optimization , 2008, Multiobjective Optimization.

[20]  Srini Narayanan,et al.  Learning all optimal policies with multiple criteria , 2008, ICML '08.

[21]  David Levine,et al.  Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning , 2007, NIPS.

[22]  Peter Vamplew,et al.  Steering approaches to Pareto-optimal multiobjective reinforcement learning , 2017, Neurocomputing.

[23]  Yacov Y. Haimes,et al.  Multiobjective Decision Making: Theory and Methodology , 1983 .

[24]  J. Dennis,et al.  A closer look at drawbacks of minimizing weighted sums of objectives for Pareto set generation in multicriteria optimization problems , 1997 .

[25]  Rui Wang,et al.  Deep Reinforcement Learning for Multiobjective Optimization , 2019, IEEE Transactions on Cybernetics.