Neural Combinatorial Optimization with Reinforcement Learning

This paper presents a framework to tackle combinatorial optimization problems using neural networks and reinforcement learning. We focus on the traveling salesman problem (TSP) and train a recurrent network that, given a set of city coordinates, predicts a distribution over different city permutations. Using negative tour length as the reward signal, we optimize the parameters of the recurrent network using a policy gradient method. We compare learning the network parameters on a set of training graphs against learning them on individual test graphs. Despite the computational expense, without much engineering and heuristic designing, Neural Combinatorial Optimization achieves close to optimal results on 2D Euclidean graphs with up to 100 nodes. Applied to the KnapSack, another NP-hard problem, the same method obtains optimal solutions for instances with up to 200 items.

[1]  Brian W. Kernighan,et al.  An Effective Heuristic Algorithm for the Traveling-Salesman Problem , 1973, Oper. Res..

[2]  Christos H. Papadimitriou,et al.  The Euclidean Traveling Salesman Problem is NP-Complete , 1977, Theor. Comput. Sci..

[3]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[4]  Richard Durbin,et al.  An analogue approach to the travelling salesman problem using an elastic net method , 1987, Nature.

[5]  Bernard Angéniol,et al.  Self-organizing feature maps and the travelling salesman problem , 1988, Neural Networks.

[6]  Mahesan Niranjan,et al.  A theoretical investigation into the performance of the Hopfield model , 1990, IEEE Trans. Neural Networks.

[7]  Giovanni Rinaldi,et al.  A Branch-and-Cut Algorithm for the Resolution of Large-Scale Symmetric Traveling Salesman Problems , 1991, SIAM Rev..

[8]  Andrew Howard Gee,et al.  Problem solving with optimization networks , 1993 .

[9]  Laura I. Burke,et al.  Neural methods for the traveling salesman problem: Insights from operations research , 1994, Neural Networks.

[10]  Bruce L. Golden,et al.  A hierarchical strategy for solving traveling salesman problems using elastic nets , 1995, J. Heuristics.

[11]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[12]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[13]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[14]  Sanjeev Arora,et al.  Polynomial time approximation schemes for Euclidean traveling salesman and other geometric problems , 1998, JACM.

[15]  Kate Smith-Miles,et al.  Neural Networks for Combinatorial Optimization: A Review of More Than a Decade of Research , 1999, INFORMS J. Comput..

[16]  Edward P. K. Tsang,et al.  Guided local search and its application to the traveling salesman problem , 1999, Eur. J. Oper. Res..

[17]  Kate A. Smith,et al.  Neural Networks for Combinatorial Optimization: a Review of More Than a Decade of Research , 1999 .

[18]  Keld Helsgaun,et al.  An effective implementation of the Lin-Kernighan traveling salesman heuristic , 2000, Eur. J. Oper. Res..

[19]  Graham Kendall,et al.  Hyper-Heuristics: An Emerging Direction in Modern Search Technology , 2003, Handbook of Metaheuristics.

[20]  William J. Cook,et al.  Implementing the Dantzig-Fulkerson-Johnson algorithm for large traveling salesman problems , 2003, Math. Program..

[21]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[22]  J. J. Hopfield,et al.  “Neural” computation of decisions in optimization problems , 1985, Biological Cybernetics.

[23]  F. Favata,et al.  A study of the application of Kohonen-type neural networks to the Travelling Salesman Problem , 1991, Biological Cybernetics.

[24]  G. Pawley,et al.  On the stability of the Travelling Salesman Problem algorithm of Hopfield and Tank , 2004, Biological Cybernetics.

[25]  J. Fort Solving a combinatorial problem via self-organizing process: An application of the Kohonen algorithm to the traveling salesman problem , 1988, Biological Cybernetics.

[26]  William J. Cook,et al.  The Traveling Salesman Problem: A Computational Study , 2007 .

[27]  Deeparnab Chakrabarty,et al.  Knapsack Problems , 2008 .

[28]  William J. Cook,et al.  Solution of a Large-Scale Traveling-Salesman Problem , 1954, 50 Years of Integer Programming.

[29]  B. F. J. La Maire,et al.  Comparison of neural networks for solving the travelling salesman problem , 2012, 11th Symposium on Neural Network Applications in Electrical Engineering.

[30]  A. A. Bhatti,et al.  Critical analysis of hopfield's neural network model for TSP and its comparison with heuristic algorithm for shortest path computation , 2012, Proceedings of 2012 9th International Bhurban Conference on Applied Sciences & Technology (IBCAST).

[31]  Bernd Bischl,et al.  Local Search and the Traveling Salesman Problem: A Feature-Based Characterization of Problem Hardness , 2012, LION.

[32]  Michel Gendreau,et al.  Hyper-heuristics: a survey of the state of the art , 2013, J. Oper. Res. Soc..

[33]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[34]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[37]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[38]  Misha Denil,et al.  Learning to Learn for Global Optimization of Black Box Functions , 2016, ArXiv.

[39]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[40]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[41]  Samy Bengio,et al.  Order Matters: Sequence to sequence for sets , 2015, ICLR.

[42]  Kyunghyun Cho,et al.  Noisy Parallel Approximate Decoding for Conditional Recurrent Language Model , 2016, ArXiv.

[43]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[44]  Misha Denil,et al.  Learning to Learn without Gradient Descent by Gradient Descent , 2016, ICML.

[45]  Manuel Laguna,et al.  Tabu Search , 1997 .