Learning to Perform Local Rewriting for Combinatorial Optimization

Search-based methods for hard combinatorial optimization are often guided by heuristics. Tuning heuristics in various conditions and situations is often time-consuming. In this paper, we propose NeuRewriter that learns a policy to pick heuristics and rewrite the local components of the current solution to iteratively improve it until convergence. The policy factorizes into a region-picking and a rule-picking component, each parameterized by a neural network trained with actor-critic methods in reinforcement learning. NeuRewriter captures the general structure of combinatorial problems and shows strong performance in three versatile tasks: expression simplification, online job scheduling and vehicle routing problems. NeuRewriter outperforms the expression simplification component in Z3; outperforms DeepRM and Google OR-tools in online job scheduling; and outperforms recent neural baselines and Google OR-tools in vehicle routing problems.

[1]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[2]  Wojciech Zaremba,et al.  Learning to Discover Efficient Mathematical Identities , 2014, NIPS.

[3]  Yuval Tassa,et al.  Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Alexandre Lacoste,et al.  Learning Heuristics for the TSP by Policy Gradient , 2018, CPAIOR.

[5]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[6]  Pushmeet Kohli,et al.  Adaptive Neural Compilation , 2016, NIPS.

[7]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[8]  Reuven Bar-Yehuda,et al.  A Linear-Time Approximation Algorithm for the Weighted Vertex Cover Problem , 1981, J. Algorithms.

[9]  Samy Bengio,et al.  Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[10]  Guy E. Blelloch,et al.  Optimally Scheduling Jobs with Multiple Tasks , 2017, PERV.

[11]  Markus Weimer,et al.  Learning To Solve Circuit-SAT: An Unsupervised Differentiable Approach , 2018, ICLR.

[12]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[15]  Dawn Xiaodong Song,et al.  GamePad: A Learning Environment for Theorem Proving , 2018, ICLR.

[16]  Max Welling,et al.  Attention, Learn to Solve Routing Problems! , 2018, ICLR.

[17]  Pushmeet Kohli,et al.  Learning Continuous Semantic Representations of Symbolic Expressions , 2016, ICML.

[18]  Le Song,et al.  2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[19]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[20]  Alexander Aiken,et al.  Stochastic superoptimization , 2012, ASPLOS '13.

[21]  Lucia Specia,et al.  Text Simplification as Tree Transduction , 2013, STIL.

[22]  Sanjit A. Seshia,et al.  Learning Heuristics for Automated Reasoning through Deep Reinforcement Learning , 2018, ArXiv.

[23]  Srikanth Kandula,et al.  Multi-resource packing for cluster schedulers , 2014 .

[24]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[25]  Richard Evans,et al.  Can Neural Networks Understand Logical Entailment? , 2018, ICLR.

[26]  J. Christopher Beck,et al.  Queueing-theoretic approaches for dynamic scheduling: A survey , 2014 .

[27]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[28]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[29]  Frédo Durand,et al.  Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.

[30]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[31]  Jacek Blazewicz,et al.  The job shop scheduling problem: Conventional and new solution techniques , 1996 .

[32]  Kaile Su,et al.  Learning of Human-like Algebraic Reasoning Using Deep Feedforward Neural Networks , 2017, Biologically Inspired Cognitive Architectures.

[33]  Yuedong Xu,et al.  Deep Reinforcement Learning for Multi-Resource Multi-Machine Job Scheduling , 2017, ArXiv.

[34]  Hongyu Guo,et al.  DAG-Structured Long Short-Term Memory for Semantic Compositionality , 2016, NAACL.

[35]  David L. Dill,et al.  Learning a SAT Solver from Single-Bit Supervision , 2018, ICLR.

[36]  Thierry Moreau,et al.  Learning to Optimize Tensor Programs , 2018, NeurIPS.

[37]  Edward A. Lee,et al.  Learning Heuristics for Quantified Boolean Formulas through Reinforcement Learning , 2020, ICLR.

[38]  Lawrence V. Snyder,et al.  Reinforcement Learning for Solving the Vehicle Routing Problem , 2018, NeurIPS.

[39]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[40]  Alessandro Bay,et al.  Approximating meta-heuristics with homotopic recurrent neural networks , 2017, ArXiv.

[41]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[42]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[43]  David Q. Mayne,et al.  Differential Dynamic Programming–A Unified Approach to the Optimization of Dynamic Systems , 1973 .

[44]  Niklas Een,et al.  MiniSat v1.13 - A SAT Solver with Conflict-Clause Minimization , 2005 .

[45]  Yuandong Tian,et al.  Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[46]  Harald Ganzinger,et al.  Rewrite-Based Equational Theorem Proving with Selection and Simplification , 1994, J. Log. Comput..

[47]  Alex Graves,et al.  Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.

[48]  Mirella Lapata,et al.  Sentence Compression as Tree Transduction , 2009, J. Artif. Intell. Res..

[49]  Srikanth Kandula,et al.  Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[50]  David Kauchak,et al.  Sentence Simplification as Tree Transduction , 2013, PITR@ACL.

[51]  Hélène Kirchner,et al.  The Term Rewriting Approach to Automated Theorem Proving , 1992, J. Log. Program..

[52]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[53]  Yuandong Tian,et al.  Automatic Local Rewriting for Combinatorial Optimization , 2018 .

[54]  M. Affenzeller,et al.  Generic Heuristics for Combinatorial Optimization Problems , 2002 .

[55]  F. Glover,et al.  In Modern Heuristic Techniques for Combinatorial Problems , 1993 .