论文信息 - Reinforcement Learning for Integer Programming: Learning to Cut - 字舞流文

Reinforcement Learning for Integer Programming: Learning to Cut

Integer programming (IP) is a general optimization framework widely applicable to a variety of unstructured and structured problems arising in, e.g., scheduling, production planning, and graph optimization. As IP models many provably hard to solve problems, modern IP solvers rely on many heuristics. These heuristics are usually human-designed, and naturally prone to suboptimality. The goal of this work is to show that the performance of those solvers can be greatly enhanced using reinforcement learning (RL). In particular, we investigate a specific methodology for solving IPs, known as the Cutting Plane Method. This method is employed as a subroutine by all modern IP solvers. We present a deep RL formulation, network architecture, and algorithms for intelligent adaptive selection of cutting planes (aka cuts). Across a wide range of IP tasks, we show that the trained RL agent significantly outperforms human-designed heuristics, and effectively generalizes to 10X larger instances and across IP problem classes. The trained agent is also demonstrated to benefit the popular downstream application of cutting plane methods in Branch-and-Cut algorithm, which is the backbone of state-of-the-art commercial IP solvers.

Yunhao Tang | Shipra Agrawal | Yuri Faenza | Shipra Agrawal | Yunhao Tang | Yuri Faenza

[1] Deeparnab Chakrabarty,et al. Knapsack Problems , 2008 .

[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[3] Laurence A. Wolsey,et al. Production Planning by Mixed Integer Programming , 2010 .

[4] Maria-Florina Balcan,et al. Learning to Branch , 2018, ICML.

[5] Le Song,et al. 2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[6] Zhuwen Li,et al. Combinatorial Optimization with Graph Convolutional Networks and Guided Tree Search , 2018, NeurIPS.

[7] Gérard Cornuéjols,et al. On the safety of Gomory cut generators , 2013, Math. Program. Comput..

[8] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[9] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[10] Le Song,et al. Learning to Branch in Mixed Integer Programming , 2016, AAAI.

[11] George B. Dantzig,et al. Solution of a Large-Scale Traveling-Salesman Problem , 1954, Oper. Res..

[12] R. Gomory. AN ALGORITHM FOR THE MIXED INTEGER PROBLEM , 1960 .

[13] Max Welling,et al. Attention Solves Your TSP , 2018, ArXiv.

[14] David Connolly. Knapsack Problems: Algorithms and Computer Implementations , 1991 .

[15] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[17] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[18] Laura Sanità,et al. 0/1 Polytopes with Quadratic Chvátal Rank , 2017, Oper. Res..

[19] G. Nemhauser,et al. Integer Programming , 2020 .

[20] Ellis L. Johnson,et al. Solving Large-Scale Zero-One Linear Programming Problems , 1983, Oper. Res..

[21] Kenta Oono,et al. Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .

[22] Marco Molinaro,et al. Theoretical challenges towards cutting-plane selection , 2018, Math. Program..

[23] Joan Bruna,et al. A Note on Learning Algorithms for Quadratic Assignment with Graph Neural Networks , 2017, ArXiv.

[24] Yoshua Bengio,et al. Machine Learning for Combinatorial Optimization: a Methodological Tour d'Horizon , 2018, Eur. J. Oper. Res..

[25] Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.

[26] Egon Balas,et al. A lift-and-project cutting plane algorithm for mixed 0–1 programs , 1993, Math. Program..

[27] Navdeep Jaitly,et al. Pointer Networks , 2015, NIPS.

[28] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.