Attention, Learn to Solve Routing Problems!

The recently presented idea to learn heuristics for combinatorial optimization problems is promising as it can save costly development. However, to push this idea towards practical implementation, we need better models and better ways of training. We contribute in both directions: we propose a model based on attention layers with benefits over the Pointer Network and we show how to train this model using REINFORCE with a simple baseline based on a deterministic greedy rollout, which we find is more efficient than using a value function. We significantly improve over recent learned heuristics for the Travelling Salesman Problem (TSP), getting close to optimal results for problems up to 100 nodes. With the same hyperparameters, we learn strong heuristics for two variants of the Vehicle Routing Problem (VRP), the Orienteering Problem (OP) and (a stochastic variant of) the Prize Collecting TSP (PCTSP), outperforming a wide range of baselines and getting results close to highly optimized and specialized algorithms.

[1]  María Merino,et al.  An efficient evolutionary algorithm for the orienteering problem , 2018, Comput. Oper. Res..

[2]  Lior Wolf,et al.  Learning the Multiple Traveling Salesmen Problem with Permutation Invariant Pooling Networks , 2018, ArXiv.

[3]  Samy Bengio,et al.  Order Matters: Sequence to sequence for sets , 2015, ICLR.

[4]  Keld Helsgaun,et al.  An Extension of the Lin-Kernighan-Helsgaun TSP Solver for Constrained Traveling Salesman and Vehicle Routing Problems: Technical report , 2017 .

[5]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[6]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[7]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[8]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[9]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Pratik Rane,et al.  Self-Critical Sequence Training for Image Captioning , 2018 .

[12]  Matteo Fischetti,et al.  Solving the Orienteering Problem through Branch-and-Cut , 1998, INFORMS J. Comput..

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Cees T. A. M. de Laat,et al.  A Medium-Scale Distributed System for Computer Science Research: Infrastructure for the Long Term , 2016, Computer.

[15]  Egon Balas,et al.  The prize collecting traveling salesman problem , 1989, Networks.

[16]  Daniel J. Rosenkrantz,et al.  An analysis of several heuristics for the traveling salesman problem , 2013, Fundamental Problems in Computing.

[17]  T. Tsiligirides,et al.  Heuristic Methods Applied to Orienteering , 1984 .

[18]  P. Glasserman,et al.  Some Guidelines and Guarantees for Common Random Numbers , 1992 .

[19]  Viswanath Nagarajan,et al.  Approximation Algorithms for Stochastic k-TSP , 2016, FSTTCS.

[20]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[21]  Alexandre Lacoste,et al.  Learning Heuristics for the TSP by Policy Gradient , 2018, CPAIOR.

[22]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[23]  Samy Bengio,et al.  Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[24]  R. Vohra,et al.  The Orienteering Problem , 1987 .

[25]  Paolo Toth,et al.  Vehicle Routing , 2014, Vehicle Routing.

[26]  Joan Bruna,et al.  A Note on Learning Algorithms for Quadratic Assignment with Graph Neural Networks , 2017, ArXiv.

[27]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[28]  Le Song,et al.  2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[29]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[30]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[31]  Kate Smith-Miles,et al.  Neural Networks for Combinatorial Optimization: A Review of More Than a Decade of Research , 1999, INFORMS J. Comput..

[32]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[33]  Lawrence V. Snyder,et al.  Reinforcement Learning for Solving the Vehicle Routing Problem , 2018, NeurIPS.

[34]  J. J. Hopfield,et al.  “Neural” computation of decisions in optimization problems , 1985, Biological Cybernetics.

[35]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[36]  Dirk Van Oudheusden,et al.  The orienteering problem: A survey , 2011, Eur. J. Oper. Res..