论文信息 - Learning to Perform Local Rewriting for Combinatorial Optimization - 字舞流文

Learning to Perform Local Rewriting for Combinatorial Optimization

Search-based methods for hard combinatorial optimization are often guided by heuristics. Tuning heuristics in various conditions and situations is often time-consuming. In this paper, we propose NeuRewriter that learns a policy to pick heuristics and rewrite the local components of the current solution to iteratively improve it until convergence. The policy factorizes into a region-picking and a rule-picking component, each parameterized by a neural network trained with actor-critic methods in reinforcement learning. NeuRewriter captures the general structure of combinatorial problems and shows strong performance in three versatile tasks: expression simplification, online job scheduling and vehicle routing problems. NeuRewriter outperforms the expression simplification component in Z3; outperforms DeepRM and Google OR-tools in online job scheduling; and outperforms recent neural baselines and Google OR-tools in vehicle routing problems.

Yuandong Tian | Xinyun Chen

[1] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[2] Wojciech Zaremba,et al. Learning to Discover Efficient Mathematical Identities , 2014, NIPS.

[3] Yuval Tassa,et al. Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4] Alexandre Lacoste,et al. Learning Heuristics for the TSP by Policy Gradient , 2018, CPAIOR.

[5] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[6] Pushmeet Kohli,et al. Adaptive Neural Compilation , 2016, NIPS.

[7] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[8] Reuven Bar-Yehuda,et al. A Linear-Time Approximation Algorithm for the Weighted Vertex Cover Problem , 1981, J. Algorithms.

[9] Samy Bengio,et al. Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[10] Guy E. Blelloch,et al. Optimally Scheduling Jobs with Multiple Tasks , 2017, PERV.

[11] Markus Weimer,et al. Learning To Solve Circuit-SAT: An Unsupervised Differentiable Approach , 2018, ICLR.

[12] Nikolaj Bjørner,et al. Z3: An Efficient SMT Solver , 2008, TACAS.

[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[15] Dawn Xiaodong Song,et al. GamePad: A Learning Environment for Theorem Proving , 2018, ICLR.

[16] Max Welling,et al. Attention, Learn to Solve Routing Problems! , 2018, ICLR.

[17] Pushmeet Kohli,et al. Learning Continuous Semantic Representations of Symbolic Expressions , 2016, ICML.

[18] Le Song,et al. 2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[19] Sergey Levine,et al. Guided Policy Search , 2013, ICML.

[20] Alexander Aiken,et al. Stochastic superoptimization , 2012, ASPLOS '13.

[21] Lucia Specia,et al. Text Simplification as Tree Transduction , 2013, STIL.

[22] Sanjit A. Seshia,et al. Learning Heuristics for Automated Reasoning through Deep Reinforcement Learning , 2018, ArXiv.

[23] Srikanth Kandula,et al. Multi-resource packing for cluster schedulers , 2014 .

[24] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[25] Richard Evans,et al. Can Neural Networks Understand Logical Entailment? , 2018, ICLR.

[26] J. Christopher Beck,et al. Queueing-theoretic approaches for dynamic scheduling: A survey , 2014 .

[27] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.

[28] Richard M. Karp,et al. Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[29] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.

[30] Randy H. Katz,et al. A view of cloud computing , 2010, CACM.

[31] Jacek Blazewicz,et al. The job shop scheduling problem: Conventional and new solution techniques , 1996 .

[32] Kaile Su,et al. Learning of Human-like Algebraic Reasoning Using Deep Feedforward Neural Networks , 2017, Biologically Inspired Cognitive Architectures.

[33] Yuedong Xu,et al. Deep Reinforcement Learning for Multi-Resource Multi-Machine Job Scheduling , 2017, ArXiv.

[34] Hongyu Guo,et al. DAG-Structured Long Short-Term Memory for Semantic Compositionality , 2016, NAACL.

[35] David L. Dill,et al. Learning a SAT Solver from Single-Bit Supervision , 2018, ICLR.

[36] Thierry Moreau,et al. Learning to Optimize Tensor Programs , 2018, NeurIPS.

[37] Edward A. Lee,et al. Learning Heuristics for Quantified Boolean Formulas through Reinforcement Learning , 2020, ICLR.

[38] Lawrence V. Snyder,et al. Reinforcement Learning for Solving the Vehicle Routing Problem , 2018, NeurIPS.

[39] Navdeep Jaitly,et al. Pointer Networks , 2015, NIPS.

[40] Alessandro Bay,et al. Approximating meta-heuristics with homotopic recurrent neural networks , 2017, ArXiv.

[41] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[42] Andrew G. Barto,et al. Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[43] David Q. Mayne,et al. Differential Dynamic Programming–A Unified Approach to the Optimization of Dynamic Systems , 1973 .

[44] Niklas Een,et al. MiniSat v1.13 - A SAT Solver with Conflict-Clause Minimization , 2005 .

[45] Yuandong Tian,et al. Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[46] Harald Ganzinger,et al. Rewrite-Based Equational Theorem Proving with Selection and Simplification , 1994, J. Log. Comput..

[47] Alex Graves,et al. Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.

[48] Mirella Lapata,et al. Sentence Compression as Tree Transduction , 2009, J. Artif. Intell. Res..

[49] Srikanth Kandula,et al. Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[50] David Kauchak,et al. Sentence Simplification as Tree Transduction , 2013, PITR@ACL.

[51] Hélène Kirchner,et al. The Term Rewriting Approach to Automated Theorem Proving , 1992, J. Log. Program..

[52] Frank L. Lewis,et al. Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[53] Yuandong Tian,et al. Automatic Local Rewriting for Combinatorial Optimization , 2018 .

[54] M. Affenzeller,et al. Generic Heuristics for Combinatorial Optimization Problems , 2002 .

[55] F. Glover,et al. In Modern Heuristic Techniques for Combinatorial Problems , 1993 .