A Multi-Objective Deep Reinforcement Learning Framework

Abstract This paper introduces a new scalable multi-objective deep reinforcement learning (MODRL) framework based on deep Q-networks. We develop a high-performance MODRL framework that supports both single-policy and multi-policy strategies, as well as both linear and non-linear approaches to action selection. The experimental results on two benchmark problems (two-objective deep sea treasure environment and three-objective Mountain Car problem) indicate that the proposed framework is able to find the Pareto-optimal solutions effectively. The proposed framework is generic and highly modularized, which allows the integration of different deep reinforcement learning algorithms in different complex problem domains. This therefore overcomes many disadvantages involved with standard multi-objective reinforcement learning methods in the current literature. The proposed framework acts as a testbed platform that accelerates the development of MODRL for solving increasingly complicated multi-objective problems.

[1]  Tomasz Tajmajer Modular Multi-Objective Deep Reinforcement Learning with Decision Values , 2018, 2018 Federated Conference on Computer Science and Information Systems (FedCSIS).

[2]  Sarangapani Jagannathan,et al.  Event-Triggered Distributed Control of Nonlinear Interconnected Systems Using Online Reinforcement Learning With Exploration , 2018, IEEE Transactions on Cybernetics.

[3]  Manuela Ruiz-Montiel,et al.  A temporal difference method for multi-objective reinforcement learning , 2017, Neurocomputing.

[4]  Marcello Restelli,et al.  Multi-objective Reinforcement Learning through Continuous Pareto Manifold Approximation , 2016, J. Artif. Intell. Res..

[5]  Evan Dekker,et al.  Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.

[6]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[7]  Hang Liu,et al.  Multi-Objective Workflow Scheduling With Deep-Q-Network-Based Multi-Agent Reinforcement Learning , 2019, IEEE Access.

[8]  Ann Nowé,et al.  Scalarized multi-objective reinforcement learning: Novel design techniques , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[9]  Randy Paffenroth,et al.  Multi-objective reinforcement learning-based deep neural networks for cognitive space communications , 2017, 2017 Cognitive Communications for Aerospace Applications Workshop (CCAA).

[10]  Joseph A. Paradiso,et al.  The gesture recognition toolkit , 2014, J. Mach. Learn. Res..

[11]  Shimon Whiteson,et al.  Multi-Objective Deep Reinforcement Learning , 2016, ArXiv.

[12]  Marcello Restelli,et al.  A multiobjective reinforcement learning approach to water resources systems operation: Pareto frontier approximation in a single run , 2013 .

[13]  Ann Nowé,et al.  Multi-objective reinforcement learning using sets of pareto dominating policies , 2014, J. Mach. Learn. Res..

[14]  Tomasz Tajmajer Multi-Objective Deep Q-Learning with Subsumption Architecture , 2017, ArXiv.

[15]  Chen-Yu Wei,et al.  Online Reinforcement Learning in Stochastic Games , 2017, NIPS.

[16]  Patrice Perny,et al.  On Finding Compromise Solutions in Multiobjective Markov Decision Processes , 2010, ECAI.

[17]  Richard S. Sutton,et al.  Two geometric input transformation methods for fast online reinforcement learning with neural nets , 2018, ArXiv.

[18]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Krzysztof Czarnecki,et al.  Urban Driving with Multi-Objective Deep Reinforcement Learning , 2018, AAMAS.

[20]  Shimon Whiteson,et al.  A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[21]  Tom Lenaerts,et al.  Dynamic Weights in Multi-Objective Deep Reinforcement Learning , 2018, ICML.

[22]  Hugo Gimbert,et al.  Online Reinforcement Learning for Real-Time Exploration in Continuous State and Action Markov Decision Processes , 2016, ArXiv.

[23]  Peter Vamplew,et al.  An Empirical Comparison of Two Common Multiobjective Reinforcement Learning Algorithms , 2012, Australasian Conference on Artificial Intelligence.

[24]  Brian Tanner,et al.  RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments , 2009, J. Mach. Learn. Res..

[25]  Kenji Doya,et al.  Finding intrinsic rewards by embodied evolution and constrained reinforcement learning , 2008, Neural Networks.

[26]  Saeid Nahavandi,et al.  System Design Perspective for Human-Level Agents Using Deep Reinforcement Learning: A Survey , 2017, IEEE Access.

[27]  David Levine,et al.  Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning , 2007, NIPS.

[28]  Peter Vamplew,et al.  MORL-Glue: a benchmark suite for multi-objective reinforcement learning , 2017 .

[29]  Konkoly Thege Multi-criteria Reinforcement Learning , 1998 .

[30]  Li Li,et al.  Optimization of Molecules via Deep Reinforcement Learning , 2018, Scientific Reports.

[31]  Peter R. Lewis,et al.  A novel adaptive weight selection algorithm for multi-objective multi-agent reinforcement learning , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[32]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[33]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[34]  John Yearwood,et al.  On the Limitations of Scalarisation for Multi-objective Reinforcement Learning of Pareto Fronts , 2008, Australasian Conference on Artificial Intelligence.

[35]  Dewen Hu,et al.  Multiobjective Reinforcement Learning: A Comprehensive Overview , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[36]  Mohammad Hassan Khooban,et al.  Reliable Power Scheduling of an Emission-Free Ship: Multiobjective Deep Reinforcement Learning , 2020, IEEE Transactions on Transportation Electrification.

[37]  Marcello Restelli,et al.  Multi-Objective Reinforcement Learning with Continuous Pareto Frontier Approximation , 2014, AAAI.

[38]  Robert Babuska,et al.  Policy derivation methods for critic-only reinforcement learning in continuous spaces , 2018, Eng. Appl. Artif. Intell..

[39]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[40]  Thomas A. Runkler,et al.  Particle swarm optimization for generating interpretable fuzzy reinforcement learning policies , 2016, Eng. Appl. Artif. Intell..

[41]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[42]  Mohamed A. Khamis,et al.  Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi-agent framework , 2014, Eng. Appl. Artif. Intell..

[43]  Ann Nowé,et al.  Hypervolume-Based Multi-Objective Reinforcement Learning , 2013, EMO.