Deep Reinforcement Learning for Multiobjective Optimization

This article proposes an end-to-end framework for solving multiobjective optimization problems (MOPs) using deep reinforcement learning (DRL), that we call DRL-based multiobjective optimization algorithm (DRL-MOA). The idea of decomposition is adopted to decompose the MOP into a set of scalar optimization subproblems. Then, each subproblem is modeled as a neural network. Model parameters of all the subproblems are optimized collaboratively according to a neighborhood-based parameter-transfer strategy and the DRL training algorithm. Pareto-optimal solutions can be directly obtained through the trained neural-network models. Specifically, the multiobjective traveling salesman problem (MOTSP) is solved in this article using the DRL-MOA method by modeling the subproblem as a Pointer Network. Extensive experiments have been conducted to study the DRL-MOA and various benchmark methods are compared with it. It is found that once the trained model is available, it can scale to newly encountered problems with no need for retraining the model. The solutions can be directly obtained by a simple forward calculation of the neural network; thereby, no iteration is required and the MOP can be always solved in a reasonable time. The proposed method provides a new way of solving the MOP by means of DRL. It has shown a set of new characteristics, for example, strong generalization ability and fast solving speed in comparison with the existing methods for multiobjective optimizations. The experimental results show the effectiveness and competitiveness of the proposed method in terms of model performance and running time.

[1]  Tao Hao,et al.  A pointer network based deep learning algorithm for 0–1 Knapsack Problem , 2018, 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI).

[2]  Samy Bengio,et al.  Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[3]  Qingfu Zhang,et al.  An Evolutionary Many-Objective Optimization Algorithm Based on Dominance and Decomposition , 2015, IEEE Transactions on Evolutionary Computation.

[4]  Alexandre Lacoste,et al.  Learning Heuristics for the TSP by Policy Gradient , 2018, CPAIOR.

[5]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[6]  Qingfu Zhang,et al.  A Grid Weighted Sum Pareto Local Search for Combinatorial Multi and Many-Objective Optimization , 2019, IEEE Transactions on Cybernetics.

[7]  Qingfu Zhang,et al.  Comparison between MOEA/D and NSGA-II on the Multi-Objective Travelling Salesman Problem , 2009 .

[8]  Kalyanmoy Deb,et al.  An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point-Based Nondominated Sorting Approach, Part I: Solving Problems With Box Constraints , 2014, IEEE Transactions on Evolutionary Computation.

[9]  Jie Chen,et al.  DMOEA-εC: Decomposition-Based Multiobjective Evolutionary Algorithm With the ε-Constraint Framework , 2017, IEEE Trans. Evol. Comput..

[10]  Brian W. Kernighan,et al.  An Effective Heuristic Algorithm for the Traveling-Salesman Problem , 1973, Oper. Res..

[11]  Max Welling,et al.  Attention, Learn to Solve Routing Problems! , 2018, ICLR.

[12]  Kaisa Miettinen,et al.  Nonlinear multiobjective optimization , 1998, International series in operations research and management science.

[13]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[14]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[15]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[16]  Gerhard Reinelt,et al.  TSPLIB - A Traveling Salesman Problem Library , 1991, INFORMS J. Comput..

[17]  Qingfu Zhang,et al.  Decomposition-Based Algorithms Using Pareto Adaptive Scalarizing Methods , 2016, IEEE Transactions on Evolutionary Computation.

[18]  Tao Zhang,et al.  Localized Weighted Sum Method for Many-Objective Optimization , 2018, IEEE Transactions on Evolutionary Computation.

[19]  André Gustavo dos Santos,et al.  Application of NSGA-II framework to the travel planning problem using real-world travel data , 2016, CEC.

[20]  Bernd Bischl,et al.  Local Search and the Traveling Salesman Problem: A Feature-Based Characterization of Problem Hardness , 2012, LION.

[21]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[22]  Lawrence V. Snyder,et al.  Reinforcement Learning for Solving the Vehicle Routing Problem , 2018, NeurIPS.

[23]  Jacques Teghem,et al.  The Multiobjective Traveling Salesman Problem: A Survey and a New Approach , 2010, Advances in Multi-Objective Nature Inspired Computing.

[24]  Tao Zhang,et al.  Evolutionary Many-Constraint Optimization: An Exploratory Analysis , 2019, EMO.

[25]  Evripidis Bampis,et al.  A Dynasearch Neighborhood for the Bicriteria Traveling Salesman Problem , 2004, Metaheuristics for Multiobjective Optimisation.

[26]  Ye Tian,et al.  A Decision Variable Clustering-Based Evolutionary Algorithm for Large-Scale Many-Objective Optimization , 2018, IEEE Transactions on Evolutionary Computation.

[27]  Andrzej Jaszkiewicz,et al.  Genetic local search for multi-objective combinatorial optimization , 2022 .

[28]  Qingfu Zhang,et al.  Hybridization of Decomposition and Local Search for Multiobjective Optimization , 2014, IEEE Transactions on Cybernetics.

[29]  Mitsuo Gen,et al.  Specification of Genetic Search Directions in Cellular Multi-objective Genetic Algorithms , 2001, EMO.

[30]  Hisao Ishibuchi,et al.  A multi-objective genetic local search algorithm and its application to flowshop scheduling , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[31]  Ye Tian,et al.  PlatEMO: A MATLAB Platform for Evolutionary Multi-Objective Optimization [Educational Forum] , 2017, IEEE Computational Intelligence Magazine.

[32]  Qguhm -DVNLHZLF,et al.  On the performance of multiple objective genetic local search on the 0 / 1 knapsack problem . A comparative experiment , 2000 .

[33]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[34]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[35]  Qingfu Zhang,et al.  An External Archive Guided Multiobjective Evolutionary Algorithm Based on Decomposition for Combinatorial Optimization , 2015, IEEE Transactions on Evolutionary Computation.

[36]  Qingfu Zhang,et al.  MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition , 2007, IEEE Transactions on Evolutionary Computation.

[37]  Shimon Whiteson,et al.  Multi-Objective Deep Reinforcement Learning , 2016, ArXiv.

[38]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39]  Wei Wei,et al.  2019 Formatting Instructions for Authors Using LaTeX , 2018 .

[40]  Qingfu Zhang,et al.  MOEA/D-ACO: A Multiobjective Evolutionary Algorithm Using Decomposition and AntColony , 2013, IEEE Transactions on Cybernetics.

[41]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.