DAN: Decentralized Attention-based Neural Network to Solve the MinMax Multiple Traveling Salesman Problem

The multiple traveling salesman problem (mTSP) is a well-known NP-hard problem with numerous real-world applications. In particular, this work addresses MinMax mTSP, where the objective is to minimize the max tour length (sum of Euclidean distances) among all agents. The mTSP is normally considered as a combinatorial optimization problem, but due to its computational complexity, search-based exact and heuristic algorithms become inefficient as the number of cities increases. Encouraged by the recent developments in deep reinforcement learning (dRL), this work considers the mTSP as a cooperative task and introduces a decentralized attention-based neural network method to solve the MinMax mTSP, named DAN. In DAN, agents learn fully decentralized policies to collaboratively construct a tour, by predicting the future decisions of other agents. Our model relies on the Transformer architecture, and is trained using multi-agent RL with parameter sharing, which provides natural scalability to the numbers of agents and cities. We experimentally demonstrate our model on smallto largescale mTSP instances, which involve 50 to 1000 cities and 5 to 20 agents, and compare against state-of-the-art baselines. For small-scale problems (fewer than 100 cities), DAN is able to closely match the performance of the best solver available (OR Tools, a meta-heuristic solver) given the same computation time budget. In larger-scale instances, DAN outperforms both conventional and dRL-based solvers, while keeping computation times low, and exhibits enhanced collaboration among agents.

[1]  Manuel López-Ibáñez,et al.  Ant colony optimization , 2010, GECCO '10.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Mihaela Breaban,et al.  Tackling the Bi-criteria Facet of Multiple Traveling Salesman Problem with Ant Colony Systems , 2015, 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).

[4]  Tamer Basar,et al.  Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.

[5]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[6]  Jinkyoo Park,et al.  ScheduleNet: Learn to solve multi-agent scheduling problems with reinforcement learning , 2021, ArXiv.

[7]  Lia Purpura On Tools , 2012 .

[8]  Madalina Raschip,et al.  SOM-Guided Evolutionary Search for Solving MinMax Multiple-TSP , 2019, 2019 IEEE Congress on Evolutionary Computation (CEC).

[9]  Lior Wolf,et al.  Learning the Multiple Traveling Salesmen Problem with Permutation Invariant Pooling Networks , 2018, ArXiv.

[10]  Sanjeev Arora,et al.  Polynomial time approximation schemes for Euclidean TSP and other geometric problems , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[13]  Darrell Whitley,et al.  A genetic algorithm tutorial , 1994, Statistics and Computing.

[14]  Ahmad Alhindi,et al.  Guided Local Search , 2010, Handbook of Heuristics.

[15]  Max Welling,et al.  Attention, Learn to Solve Routing Problems! , 2018, ICLR.

[16]  Samy Bengio,et al.  Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[17]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[18]  Michael I. Jordan,et al.  Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.

[19]  Yujiao Hu,et al.  A reinforcement learning approach for optimizing multiple traveling salesman problems over graphs , 2020, Knowl. Based Syst..

[20]  T. Bektaş The multiple traveling salesman problem: an overview of formulations and solution procedures , 2006 .