论文信息 - Improving SAT Solver Heuristics with Graph Networks and Reinforcement Learning

Improving SAT Solver Heuristics with Graph Networks and Reinforcement Learning

We present GQSAT, a branching heuristic in a Boolean SAT solver trained with value-based reinforcement learning (RL) using Graph Neural Networks for function approximation. Solvers using GQSAT are complete SAT solvers that either provide a satisfying assignment or a proof of unsatisfiability, which is required for many SAT applications. The branching heuristic commonly used in SAT solvers today suffers from bad decisions during their warm-up period, whereas GQSAT has been trained to examine the structure of the particular problem instance to make better decisions at the beginning of the search. Training GQSAT is data efficient and does not require elaborate dataset preparation or feature engineering to train. We train GQSAT on small SAT problems using RL interfacing with an existing SAT solver. We show that GQSAT is able to reduce the number of iterations required to solve SAT problems by 2-3X, and it generalizes to unsatisfiable SAT instances, as well as to problems with 5X more variables than it was trained on. We also show that, to a lesser extent, it generalizes to SAT problems from different domains by evaluating it on graph coloring. Our experiments show that augmenting SAT solvers with agents trained with RL and graph neural networks can improve performance on the SAT search problem.

Shimon Whiteson | Bryan Catanzaro | Vitaly Kurin | Saad Godil

[1] Zongqing Lu,et al. Graph Convolutional Reinforcement Learning for Multi-Agent Cooperation , 2018, ArXiv.

[2] Thierry Coppey,et al. SmartChoices: Hybridizing Programming and Machine Learning , 2019 .

[3] Matthew B. Blaschko,et al. Perceptron Learning of SAT , 2012, NIPS.

[4] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[5] Cristian Grozea,et al. Can Machine Learning Learn a Decision Oracle for NP Problems? A Test on SAT , 2014, Fundam. Informaticae.

[6] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[7] Peter C. Cheeseman,et al. Where the Really Hard Problems Are , 1991, IJCAI.

[8] Navdeep Jaitly,et al. Pointer Networks , 2015, NIPS.

[9] Roberto J. Bayardo,et al. Using CSP Look-Back Techniques to Solve Real-World SAT Instances , 1997, AAAI/IAAI.

[10] Toby Walsh,et al. Restart Strategy Selection Using Machine Learning Techniques , 2009, SAT.

[11] Jan Eric Lenssen,et al. Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[12] Joao Marques-Silva,et al. Empirical Study of the Anatomy of Modern Sat Solvers , 2011, SAT.

[13] David L. Dill,et al. Learning a SAT Solver from Single-Bit Supervision , 2018, ICLR.

[14] Henryk Michalewski,et al. Neural heuristics for SAT solving , 2020, ArXiv.

[15] Fei Wang,et al. From Gameplay to Symbolic Reasoning , 2018 .

[16] Kevin Leyton-Brown,et al. SATzilla: Portfolio-based Algorithm Selection for SAT , 2008, J. Artif. Intell. Res..

[17] Jessica B. Hamrick,et al. Structured agents for physical construction , 2019, ICML.

[18] Peter J. Stuckey,et al. Propagation via lazy clause generation , 2009, Constraints.

[19] Martin Rinard,et al. AvatarSAT: An Auto-tuning Boolean SAT Solver , 2009 .

[20] F. Scarselli,et al. A new model for learning in graph domains , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[21] Krzysztof Czarnecki,et al. Understanding VSIDS Branching Heuristics in Conflict-Driven Clause-Learning SAT Solvers , 2015, Haifa Verification Conference.

[22] Sebastian Fischmeister,et al. Impact of Community Structure on SAT Solver Performance , 2014, SAT.

[23] Joao Marques-Silva,et al. GRASP: A Search Algorithm for Propositional Satisfiability , 1999, IEEE Trans. Computers.

[24] Le Song,et al. 2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[25] Nikolaj Bjørner,et al. Guiding High-Performance SAT Solvers with Unsat-Core Predictions , 2019, SAT.

[26] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.

[27] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[28] Krzysztof Czarnecki,et al. Learning Rate Based Branching Heuristic for SAT Solvers , 2016, SAT.

[29] Richard M. Karp,et al. Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[30] Niklas Een,et al. MiniSat v1.13 - A SAT Solver with Conflict-Clause Minimization , 2005 .

[31] Wei Wei,et al. Reinforcement Learning Driven Heuristic Optimization , 2019, ArXiv.

[32] Samy Bengio,et al. Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[33] Thomas Stützle,et al. SATLIB: An Online Resource for Research on SAT , 2000 .

[34] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[35] Sharad Malik,et al. Chaff: engineering an efficient SAT solver , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[36] Sanja Fidler,et al. NerveNet: Learning Structured Policy with Graph Neural Networks , 2018, ICLR.

[37] Razvan Pascanu,et al. Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[38] Daniel Kudenko,et al. Deep Multi-Agent Reinforcement Learning with Relevance Graphs , 2018, ArXiv.

[39] Markus Weimer,et al. Learning To Solve Circuit-SAT: An Unsupervised Differentiable Approach , 2018, ICLR.

[40] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[41] Sumit Kumar,et al. Learning Transferable Cooperative Behavior in Multi-Agent Teams , 2019, AAMAS.

[42] Raia Hadsell,et al. Graph networks as learnable physics engines for inference and control , 2018, ICML.

[43] Sanjit A. Seshia,et al. Learning Heuristics for Automated Reasoning through Deep Reinforcement Learning , 2018, ArXiv.

[44] Sarah M. Loos,et al. Graph Representations for Higher-Order Logic and Theorem Proving , 2019, AAAI.