Automatic Local Rewriting for Combinatorial Optimization

To solve combinatorial problems, traditional search-based optimization uses heuristics to guide the search. Deciding in which specific conditions and situations a certain heuristic can be applied is time-consuming and might cost decades to tune. In this paper, we learn a policy to pick heuristics and rewrite the local components of the current solution to iteratively improve it until convergence. The policy factorizes into a regionpicking policy and a rule-picking policy, and it employs a neural network trained with reinforcement learning. We evaluate our approach in three domains: expression simplification, online job scheduling and SAT. Our approach outperforms Z3 (De Moura & Bjørner, 2008), a state-of-the-art theorem prover, in expression simplification, outperforms DeepRM (Mao et al., 2016) and Google OR-tools (Google, 2019) in online job scheduling, and outperforms NeuroSAT (Selsam et al., 2019) and DG-DAGRNN (Amizadeh et al., 2019) in SAT with a small number of variables.

[1]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[2]  Samy Bengio,et al.  Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[3]  Le Song,et al.  2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[4]  Guy E. Blelloch,et al.  Optimally Scheduling Jobs with Multiple Tasks , 2017, PERV.

[5]  Reuven Bar-Yehuda,et al.  A Linear-Time Approximation Algorithm for the Weighted Vertex Cover Problem , 1981, J. Algorithms.

[6]  Jacek Blazewicz,et al.  The job shop scheduling problem: Conventional and new solution techniques , 1996 .

[7]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[8]  Lucia Specia,et al.  Text Simplification as Tree Transduction , 2013, STIL.

[9]  Sanjit A. Seshia,et al.  Learning Heuristics for Automated Reasoning through Deep Reinforcement Learning , 2018, ArXiv.

[10]  Kaile Su,et al.  Learning of Human-like Algebraic Reasoning Using Deep Feedforward Neural Networks , 2017, Biologically Inspired Cognitive Architectures.

[11]  Alessandro Bay,et al.  Approximating meta-heuristics with homotopic recurrent neural networks , 2017, ArXiv.

[12]  J. Christopher Beck,et al.  Queueing-theoretic approaches for dynamic scheduling: A survey , 2014 .

[13]  Lawrence V. Snyder,et al.  Reinforcement Learning for Solving the Vehicle Routing Problem , 2018, NeurIPS.

[14]  Pushmeet Kohli,et al.  Learning Continuous Semantic Representations of Symbolic Expressions , 2016, ICML.

[15]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[16]  Alexander Aiken,et al.  Stochastic superoptimization , 2012, ASPLOS '13.

[17]  Yuandong Tian,et al.  Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[18]  Alex Graves,et al.  Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.

[19]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[20]  Frédo Durand,et al.  Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.

[21]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[22]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[23]  David Kauchak,et al.  Sentence Simplification as Tree Transduction , 2013, PITR@ACL.

[24]  Markus Weimer,et al.  Learning To Solve Circuit-SAT: An Unsupervised Differentiable Approach , 2018, ICLR.

[25]  Mirella Lapata,et al.  Sentence Compression as Tree Transduction , 2009, J. Artif. Intell. Res..

[26]  Srikanth Kandula,et al.  Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[27]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[28]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[29]  Juliane Jung,et al.  The Traveling Salesman Problem: A Computational Study , 2007 .

[30]  Niklas Een,et al.  MiniSat v1.13 - A SAT Solver with Conflict-Clause Minimization , 2005 .

[31]  Dawn Xiaodong Song,et al.  GamePad: A Learning Environment for Theorem Proving , 2018, ICLR.

[32]  Max Welling,et al.  Attention, Learn to Solve Routing Problems! , 2018, ICLR.

[33]  Thierry Moreau,et al.  Learning to Optimize Tensor Programs , 2018, NeurIPS.

[34]  M. Affenzeller,et al.  Generic Heuristics for Combinatorial Optimization Problems , 2002 .

[35]  Pushmeet Kohli,et al.  Adaptive Neural Compilation , 2016, NIPS.

[36]  Yuedong Xu,et al.  Deep Reinforcement Learning for Multi-Resource Multi-Machine Job Scheduling , 2017, ArXiv.

[37]  Hongyu Guo,et al.  DAG-Structured Long Short-Term Memory for Semantic Compositionality , 2016, NAACL.

[38]  Srikanth Kandula,et al.  Multi-resource packing for cluster schedulers , 2014, SIGCOMM.

[39]  Richard Evans,et al.  Can Neural Networks Understand Logical Entailment? , 2018, ICLR.

[40]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[41]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[42]  Wojciech Zaremba,et al.  Learning to Discover Efficient Mathematical Identities , 2014, NIPS.

[43]  Yuval Tassa,et al.  Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[44]  David L. Dill,et al.  Learning a SAT Solver from Single-Bit Supervision , 2018, ICLR.

[45]  C. Reeves Modern heuristic techniques for combinatorial problems , 1993 .

[46]  David Q. Mayne,et al.  Differential Dynamic Programming–A Unified Approach to the Optimization of Dynamic Systems , 1973 .