Exploratory Combinatorial Optimization with Reinforcement Learning

Many real-world problems can be reduced to combinatorial optimization on a graph, where the subset or ordering of vertices that maximize some objective function must be found. With such tasks often NP-hard and analytically intractable, reinforcement learning (RL) has shown promise as a framework with which efficient heuristic methods to tackle these problems can be learned. Previous works construct the solution subset incrementally, adding one element at a time, however, the irreversible nature of this approach prevents the agent from revising its earlier decisions, which may be necessary given the complexity of the optimization task. We instead propose that the agent should seek to continuously improve the solution by learning to explore at test time. Our approach of exploratory combinatorial optimization (ECO-DQN) is, in principle, applicable to any combinatorial problem that can be defined on a graph. Experimentally, we show our method to produce state-of-the-art RL performance on the Maximum Cut problem. Moreover, because ECO-DQN can start from any arbitrary configuration, it can be combined with other search methods to further improve performance, which we demonstrate using a simple random search.

[1]  Le Song,et al.  2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[2]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[3]  P. Manju,et al.  Fast machine-learning online optimization of ultra-cold-atom experiments , 2015, Scientific Reports.

[4]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[5]  David P. Williamson,et al.  The Design of Approximation Algorithms , 2011 .

[6]  Chukwudi Anyakoha,et al.  A review of particle swarm optimization. Part I: background and development , 2007, Natural Computing.

[7]  A I Lvovsky,et al.  Annealing by simulating the coherent Ising machine. , 2019, Optics express.

[8]  Christos H. Papadimitriou,et al.  The Euclidean Traveling Salesman Problem is NP-Complete , 1977, Theor. Comput. Sci..

[9]  David P. Williamson,et al.  Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[10]  Zhuwen Li,et al.  Combinatorial Optimization with Graph Convolutional Networks and Guided Tree Search , 2018, NeurIPS.

[11]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[12]  Kazuyuki Aihara,et al.  Destabilization of Local Minima in Analog Spin Systems by Correction of Amplitude Heterogeneity. , 2018, Physical review letters.

[13]  Chukwudi Anyakoha,et al.  A review of particle swarm optimization. Part II: hybridisation, combinatorial, multicriteria and constrained optimization, and indicative applications , 2008, Natural Computing.

[14]  Travis S. Humble,et al.  Financial Portfolio Management using D-Wave’s Quantum Optimizer: The Case of Abu Dhabi Securities Exchange , 2017 .

[15]  Kate Smith-Miles,et al.  Neural Networks for Combinatorial Optimization: A Review of More Than a Decade of Research , 1999, INFORMS J. Comput..

[16]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[17]  Fred Glover,et al.  A Unified Framework for Modeling and Solving Combinatorial Optimization Problems: A Tutorial , 2006 .

[18]  G. Pawley,et al.  On the stability of the Travelling Salesman Problem algorithm of Hopfield and Tank , 2004, Biological Cybernetics.

[19]  Samy Bengio,et al.  Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[20]  William R. Clements,et al.  Gaussian optical Ising machines , 2017, 1706.06488.

[21]  M. W. Johnson,et al.  Quantum annealing with manufactured spins , 2011, Nature.

[22]  G. Rose,et al.  Finding low-energy conformations of lattice protein models by quantum annealing , 2012, Scientific Reports.

[23]  Davide Venturelli,et al.  Reverse quantum annealing approach to portfolio optimization problems , 2018, Quantum Machine Intelligence.

[24]  Boulevard Lavoisier,et al.  Breakout Local Search for the Max-Cut Problem , 2014 .

[25]  S. Safra,et al.  On the hardness of approximating minimum vertex cover , 2005 .

[26]  Masashi Sugiyama,et al.  Solving NP-Hard Problems on Graphs by Reinforcement Learning without Domain Knowledge , 2019, ArXiv.

[27]  B. Bollobás The evolution of random graphs , 1984 .

[28]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[29]  Ken-ichi Kawarabayashi,et al.  Coherent Ising Machine - Optical Neural Network Operating at the Quantum Limit - , 2018, 2018 Conference on Lasers and Electro-Optics Pacific Rim (CLEO-PR).

[30]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[31]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[32]  Ambuj K. Singh,et al.  Learning Heuristics over Large Graphs via Deep Reinforcement Learning , 2019, ArXiv.

[33]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[34]  F. Barahona On the computational complexity of Ising spin glass models , 1982 .

[35]  J. J. Hopfield,et al.  “Neural” computation of decisions in optimization problems , 1985, Biological Cybernetics.

[36]  Jin-Kao Hao,et al.  Breakout Local Search for the Max-Cutproblem , 2013, Eng. Appl. Artif. Intell..

[37]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.