论文信息 - Automatic Local Rewriting for Combinatorial Optimization - 字舞流文

Automatic Local Rewriting for Combinatorial Optimization

To solve combinatorial problems, traditional search-based optimization uses heuristics to guide the search. Deciding in which specific conditions and situations a certain heuristic can be applied is time-consuming and might cost decades to tune. In this paper, we learn a policy to pick heuristics and rewrite the local components of the current solution to iteratively improve it until convergence. The policy factorizes into a regionpicking policy and a rule-picking policy, and it employs a neural network trained with reinforcement learning. We evaluate our approach in three domains: expression simplification, online job scheduling and SAT. Our approach outperforms Z3 (De Moura & Bjørner, 2008), a state-of-the-art theorem prover, in expression simplification, outperforms DeepRM (Mao et al., 2016) and Google OR-tools (Google, 2019) in online job scheduling, and outperforms NeuroSAT (Selsam et al., 2019) and DG-DAGRNN (Amizadeh et al., 2019) in SAT with a small number of variables.

Yuandong Tian | Xinyun Chen | Xinyun Chen | Yuandong Tian

[1] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[2] Samy Bengio,et al. Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[3] Le Song,et al. 2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[4] Guy E. Blelloch,et al. Optimally Scheduling Jobs with Multiple Tasks , 2017, PERV.

[5] Reuven Bar-Yehuda,et al. A Linear-Time Approximation Algorithm for the Weighted Vertex Cover Problem , 1981, J. Algorithms.

[6] Jacek Blazewicz,et al. The job shop scheduling problem: Conventional and new solution techniques , 1996 .

[7] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[8] Lucia Specia,et al. Text Simplification as Tree Transduction , 2013, STIL.

[9] Sanjit A. Seshia,et al. Learning Heuristics for Automated Reasoning through Deep Reinforcement Learning , 2018, ArXiv.

[10] Kaile Su,et al. Learning of Human-like Algebraic Reasoning Using Deep Feedforward Neural Networks , 2017, Biologically Inspired Cognitive Architectures.

[11] Alessandro Bay,et al. Approximating meta-heuristics with homotopic recurrent neural networks , 2017, ArXiv.

[12] J. Christopher Beck,et al. Queueing-theoretic approaches for dynamic scheduling: A survey , 2014 .

[13] Lawrence V. Snyder,et al. Reinforcement Learning for Solving the Vehicle Routing Problem , 2018, NeurIPS.

[14] Pushmeet Kohli,et al. Learning Continuous Semantic Representations of Symbolic Expressions , 2016, ICML.

[15] Sergey Levine,et al. Guided Policy Search , 2013, ICML.

[16] Alexander Aiken,et al. Stochastic superoptimization , 2012, ASPLOS '13.

[17] Yuandong Tian,et al. Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[18] Alex Graves,et al. Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.

[19] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[20] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.

[21] Andrew G. Barto,et al. Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[22] Randy H. Katz,et al. A view of cloud computing , 2010, CACM.

[23] David Kauchak,et al. Sentence Simplification as Tree Transduction , 2013, PITR@ACL.

[24] Markus Weimer,et al. Learning To Solve Circuit-SAT: An Unsupervised Differentiable Approach , 2018, ICLR.

[25] Mirella Lapata,et al. Sentence Compression as Tree Transduction , 2009, J. Artif. Intell. Res..

[26] Srikanth Kandula,et al. Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[27] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[28] Frank L. Lewis,et al. Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[29] Juliane Jung,et al. The Traveling Salesman Problem: A Computational Study , 2007 .

[30] Niklas Een,et al. MiniSat v1.13 - A SAT Solver with Conflict-Clause Minimization , 2005 .

[31] Dawn Xiaodong Song,et al. GamePad: A Learning Environment for Theorem Proving , 2018, ICLR.

[32] Max Welling,et al. Attention, Learn to Solve Routing Problems! , 2018, ICLR.

[33] Thierry Moreau,et al. Learning to Optimize Tensor Programs , 2018, NeurIPS.

[34] M. Affenzeller,et al. Generic Heuristics for Combinatorial Optimization Problems , 2002 .

[35] Pushmeet Kohli,et al. Adaptive Neural Compilation , 2016, NIPS.

[36] Yuedong Xu,et al. Deep Reinforcement Learning for Multi-Resource Multi-Machine Job Scheduling , 2017, ArXiv.

[37] Hongyu Guo,et al. DAG-Structured Long Short-Term Memory for Semantic Compositionality , 2016, NAACL.

[38] Srikanth Kandula,et al. Multi-resource packing for cluster schedulers , 2014, SIGCOMM.

[39] Richard Evans,et al. Can Neural Networks Understand Logical Entailment? , 2018, ICLR.

[40] Nikolaj Bjørner,et al. Z3: An Efficient SMT Solver , 2008, TACAS.

[41] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[42] Wojciech Zaremba,et al. Learning to Discover Efficient Mathematical Identities , 2014, NIPS.

[43] Yuval Tassa,et al. Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[44] David L. Dill,et al. Learning a SAT Solver from Single-Bit Supervision , 2018, ICLR.

[45] C. Reeves. Modern heuristic techniques for combinatorial problems , 1993 .

[46] David Q. Mayne,et al. Differential Dynamic Programming–A Unified Approach to the Optimization of Dynamic Systems , 1973 .