2 Common Formulation for Greedy Algorithms on Graphs

The design of good heuristics or approximation algorithms for NP-hard combinatorial optimization problems often requires significant specialized knowledge and trial-and-error. Can we automate this challenging, tedious process, and learn the algorithms instead? In many real-world applications, it is typically the case that the same optimization problem is solved again and again on a regular basis, maintaining the same problem structure but differing in the data. This provides an opportunity for learning heuristic algorithms that exploit the structure of such recurring problems. In this paper, we propose a unique combination of reinforcement learning and graph embedding to address this challenge. The learned greedy policy behaves like a meta-algorithm that incrementally constructs a solution, and the action is determined by the output of a graph embedding network capturing the current state of the solution. We show that our framework can be applied to a diverse range of optimization problems over graphs, and learns effective algorithms for the Minimum Vertex Cover, Maximum Cut and Traveling Salesman problems.

[1]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[2]  Andrew C. Ho,et al.  Set covering algorithms using cutting planes, heuristics, and subgradient optimization: A computational study , 1980 .

[3]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[4]  B. Bollobás The evolution of random graphs , 1984 .

[5]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[6]  Gérard Cornuéjols,et al.  The traveling salesman problem on a graph and some related integer polyhedra , 1985, Math. Program..

[7]  Gerhard Reinelt,et al.  TSPLIB - A Traveling Salesman Problem Library , 1991, INFORMS J. Comput..

[8]  Gideon Schechtman,et al.  Approximating bounded 0-1 integer linear programs , 1993, [1993] The 2nd Israel Symposium on Theory and Computing Systems.

[9]  David P. Williamson,et al.  Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[10]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[11]  Andrew W. Moore,et al.  Learning Evaluation Functions to Improve Optimization by Local Search , 2001, J. Mach. Learn. Res..

[12]  ATSPDavid S. JohnsonAT Experimental Analysis of Heuristics for the Stsp , 2001 .

[13]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[14]  Michail G. Lagoudakis,et al.  Learning to Select Branching Rules in the DPLL Procedure for Satisfiability , 2001, Electron. Notes Discret. Math..

[15]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[16]  Éva Tardos,et al.  Algorithm design , 2005 .

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  William J. Cook,et al.  The Traveling Salesman Problem: A Computational Study (Princeton Series in Applied Mathematics) , 2007 .

[19]  William J. Cook,et al.  The Traveling Salesman Problem: A Computational Study , 2007 .

[20]  Horst Samulowitz,et al.  Learning to Solve QBF , 2007, AAAI.

[21]  David S. Johnson,et al.  Experimental Analysis of Heuristics for the STSP , 2007 .

[22]  G. Evans,et al.  Learning to Optimize , 2008 .

[23]  Jure Leskovec,et al.  Inferring networks of diffusion and influence , 2010, KDD.

[24]  Ashish Sabharwal,et al.  Guiding Combinatorial Optimization with UCT , 2012, CPAIOR.

[25]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[26]  Le Song,et al.  Scalable Influence Estimation in Continuous-Time Diffusion Networks , 2013, NIPS.

[27]  Dimitri J. Papageorgiou,et al.  MIRPLib - A library of maritime inventory routing problem instances: Survey, core model, and benchmark results , 2014, Eur. J. Oper. Res..

[28]  He He,et al.  Learning to Search in Branch and Bound Algorithms , 2014, NIPS.

[29]  Le Song,et al.  Scalable diffusion-aware optimization of network topology , 2014, KDD.

[30]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[33]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[34]  Le Song,et al.  Discriminative Embeddings of Latent Variable Models for Structured Data , 2016, ICML.

[35]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[36]  Misha Denil,et al.  Learning to Learn for Global Optimization of Black Box Functions , 2016, ArXiv.

[37]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[38]  Le Song,et al.  Learning to Branch in Mixed Integer Programming , 2016, AAAI.

[39]  George L. Nemhauser,et al.  Learning to Run Heuristics in Tree Search , 2017, IJCAI.

[40]  Sergey Levine,et al.  Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.

[41]  Misha Denil,et al.  Learning to Learn without Gradient Descent by Gradient Descent , 2016, ICML.

[42]  Samy Bengio,et al.  Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.