Learning to Solve Multiple-TSP With Time Window and Rejections via Deep Reinforcement Learning

We propose a manager-worker framework (the implementation of our model is publically available at: https://github.com/zcaicaros/manager-worker-mtsptwr) based on deep reinforcement learning to tackle a hard yet nontrivial variant of Travelling Salesman Problem (TSP), i.e., multiple-vehicle TSP with time window and rejections (mTSPTWR), where customers who cannot be served before the deadline are subject to rejections. Particularly, in the proposed framework, a manager agent learns to divide mTSPTWR into sub-routing tasks by assigning customers to each vehicle via a Graph Isomorphism Network (GIN) based policy network. A worker agent learns to solve sub-routing tasks by minimizing the cost in terms of both tour length and rejection rate for each vehicle, the maximum of which is then fed back to the manager agent to learn better assignments. Experimental results demonstrate that the proposed framework outperforms strong baselines in terms of higher solution quality and shorter computation time. More importantly, the trained agents also achieve competitive performance for solving unseen larger instances.

[1]  Yining Ma,et al.  Deep Reinforcement Learning for Solving the Heterogeneous Capacitated Vehicle Routing Problem , 2021, IEEE Transactions on Cybernetics.

[2]  Zhiguang Cao,et al.  Heterogeneous Attentions for Solving Pickup and Delivery Problem via Deep Reinforcement Learning , 2021, IEEE Transactions on Intelligent Transportation Systems.

[3]  Jie Zhang,et al.  Learning Improvement Heuristics for Solving Routing Problems , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Mohammad Tamannaei,et al.  An integrated production and transportation scheduling problem with order acceptance and resource allocation decisions , 2021, Appl. Soft Comput..

[5]  Jie Zhang,et al.  NeuroLKH: Combining Deep Learning Model with Lin-Kernighan-Helsgaun Heuristic for Solving the Traveling Salesman Problem , 2021, NeurIPS.

[6]  H. Zha,et al.  Generalize a Small Pre-trained Model to Arbitrarily Large TSP Instances , 2020, AAAI.

[7]  Zhiguang Cao,et al.  Step-Wise Deep Learning Models for Solving Routing Problems , 2020, IEEE Transactions on Industrial Informatics.

[8]  Xi Zhao,et al.  A Hybrid of Deep Reinforcement Learning and Local Search for the Vehicle Routing Problems , 2020, IEEE Transactions on Intelligent Transportation Systems.

[9]  Louis-Martin Rousseau,et al.  Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization , 2020, AAAI.

[10]  Zhu Zhaomin,et al.  An novel shortest path algorithm based on spatial relations , 2020 .

[11]  Yujiao Hu,et al.  A reinforcement learning approach for optimizing multiple traveling salesman problems over graphs , 2020, Knowl. Based Syst..

[12]  Justin Dauwels,et al.  Deep Reinforcement Learning for Traveling Salesman Problem with Time Windows and Rejections , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[13]  Lei Gao,et al.  Dynamic Partial Removal: A Neural Network Heuristic for Large Neighborhood Search , 2020, ArXiv.

[14]  Lei Gao,et al.  Learn to Design the Heuristics for Vehicle Routing Problem , 2020, ArXiv.

[15]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[16]  Qiang Ma,et al.  Combinatorial Optimization by Graph Pointer Networks and Hierarchical Reinforcement Learning , 2019, ArXiv.

[17]  Yan Song,et al.  Traveling-Salesman-Problem Algorithm Based on Simulated Annealing and Gene-Expression Programming , 2018, Inf..

[18]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[19]  Max Welling,et al.  Attention, Learn to Solve Routing Problems! , 2018, ICLR.

[20]  Barnabás Póczos,et al.  Learning Local Search Heuristics for Boolean Satisfiability , 2019, NeurIPS.

[21]  Zhuwen Li,et al.  Combinatorial Optimization with Graph Convolutional Networks and Guided Tree Search , 2018, NeurIPS.

[22]  Alexandre Lacoste,et al.  Learning Heuristics for the TSP by Policy Gradient , 2018, CPAIOR.

[23]  W. Y. Szeto,et al.  A survey of dial-a-ride problems: Literature review and recent developments , 2018 .

[24]  Lior Wolf,et al.  Learning the Multiple Traveling Salesmen Problem with Permutation Invariant Pooling Networks , 2018, ArXiv.

[25]  Lawrence V. Snyder,et al.  Reinforcement Learning for Solving the Vehicle Routing Problem , 2018, NeurIPS.

[26]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[27]  Edward K. Baker,et al.  An Exact Algorithm for the Time-Constrained Traveling Salesman Problem , 2016 .

[28]  Shahnorbanun Sahran,et al.  The variants of the Bees Algorithm (BA): a survey , 2016, Artificial Intelligence Review.

[29]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[30]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[31]  Lia Purpura On Tools , 2012 .

[32]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[33]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[34]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[35]  Charles J. Malmborg,et al.  A genetic algorithm for service level based vehicle scheduling , 1996 .