Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?

We present Graph-$Q$-SAT, a branching heuristic for a Boolean SAT solver trained with value-based reinforcement learning (RL) using Graph Neural Networks for function approximation. Solvers using Graph-$Q$-SAT are complete SAT solvers that either provide a satisfying assignment or proof of unsatisfiability, which is required for many SAT applications. The branching heuristics commonly used in SAT solvers make poor decisions during their warm-up period, whereas Graph-$Q$-SAT is trained to examine the structure of the particular problem instance to make better decisions early in the search. Training Graph-$Q$-SAT is data efficient and does not require elaborate dataset preparation or feature engineering. We train Graph-$Q$-SAT using RL interfacing with MiniSat solver and show that Graph-$Q$-SAT can reduce the number of iterations required to solve SAT problems by 2-3X. Furthermore, it generalizes to unsatisfiable SAT instances, as well as to problems with 5X more variables than it was trained on. We show that for larger problems, reductions in the number of iterations lead to wall clock time reductions, the ultimate goal when designing heuristics. We also show positive zero-shot transfer behavior when testing Graph-$Q$-SAT on a task family different from that used for training. While more work is needed to apply Graph-$Q$-SAT to reduce wall clock time in modern SAT solving settings, it is a compelling proof-of-concept showing that RL equipped with Graph Neural Networks can learn a generalizable branching heuristic for SAT search.

[1]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[2]  Joao Marques-Silva,et al.  GRASP: A Search Algorithm for Propositional Satisfiability , 1999, IEEE Trans. Computers.

[3]  Thomas Stützle,et al.  SATLIB: An Online Resource for Research on SAT , 2000 .

[4]  Krzysztof Czarnecki,et al.  Understanding VSIDS Branching Heuristics in Conflict-Driven Clause-Learning SAT Solvers , 2015, Haifa Verification Conference.

[5]  Sebastian Fischmeister,et al.  Impact of Community Structure on SAT Solver Performance , 2014, SAT.

[6]  Kevin Leyton-Brown,et al.  SATzilla: Portfolio-based Algorithm Selection for SAT , 2008, J. Artif. Intell. Res..

[7]  Edward A. Lee,et al.  Learning Heuristics for Quantified Boolean Formulas through Reinforcement Learning , 2020, ICLR.

[8]  Jessica B. Hamrick,et al.  Structured agents for physical construction , 2019, ICML.

[9]  Peter J. Stuckey,et al.  Propagation via lazy clause generation , 2009, Constraints.

[10]  Joao Marques-Silva,et al.  Empirical Study of the Anatomy of Modern Sat Solvers , 2011, SAT.

[11]  David L. Dill,et al.  Learning a SAT Solver from Single-Bit Supervision , 2018, ICLR.

[12]  Barnabás Póczos,et al.  Learning Local Search Heuristics for Boolean Satisfiability , 2019, NeurIPS.

[13]  Henryk Michalewski,et al.  Neural heuristics for SAT solving , 2020, ArXiv.

[14]  Peter C. Cheeseman,et al.  Where the Really Hard Problems Are , 1991, IJCAI.

[15]  Scott Kirkpatrick,et al.  Optimization by Simmulated Annealing , 1983, Sci..

[16]  Markus Weimer,et al.  Learning To Solve Circuit-SAT: An Unsupervised Differentiable Approach , 2018, ICLR.

[17]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[18]  Fei Wang,et al.  From Gameplay to Symbolic Reasoning: Learning SAT Solver Heuristics in the Style of Alpha(Go) Zero , 2018, ArXiv.

[19]  Sumit Kumar,et al.  Learning Transferable Cooperative Behavior in Multi-Agent Teams , 2019, AAMAS.

[20]  Raia Hadsell,et al.  Graph networks as learnable physics engines for inference and control , 2018, ICML.

[21]  Hans van Maaren,et al.  Look-Ahead Based SAT Solvers , 2009, Handbook of Satisfiability.

[22]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[23]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[24]  Sharad Malik,et al.  Chaff: engineering an efficient SAT solver , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[25]  Martin Rinard,et al.  AvatarSAT: An Auto-tuning Boolean SAT Solver , 2009 .

[26]  F. Scarselli,et al.  A new model for learning in graph domains , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[27]  Le Song,et al.  2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[28]  Nikolaj Bjørner,et al.  Guiding High-Performance SAT Solvers with Unsat-Core Predictions , 2019, SAT.

[29]  Bart Selman,et al.  Local search strategies for satisfiability testing , 1993, Cliques, Coloring, and Satisfiability.

[30]  Krzysztof Czarnecki,et al.  Learning Rate Based Branching Heuristic for SAT Solvers , 2016, SAT.

[31]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[32]  Cristian Grozea,et al.  Can Machine Learning Learn a Decision Oracle for NP Problems? A Test on SAT , 2014, Fundam. Informaticae.

[33]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[34]  Niklas Sörensson,et al.  An Extensible SAT-solver , 2003, SAT.

[35]  Roberto J. Bayardo,et al.  Using CSP Look-Back Techniques to Solve Real-World SAT Instances , 1997, AAAI/IAAI.

[36]  Toby Walsh,et al.  Restart Strategy Selection Using Machine Learning Techniques , 2009, SAT.

[37]  Jan Eric Lenssen,et al.  Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[38]  Sarah M. Loos,et al.  Graph Representations for Higher-Order Logic and Theorem Proving , 2019, AAAI.

[39]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[40]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[41]  Zongqing Lu,et al.  Graph Convolutional Reinforcement Learning for Multi-Agent Cooperation , 2018, ArXiv.

[42]  Thierry Coppey,et al.  SmartChoices: Hybridizing Programming and Machine Learning , 2019 .

[43]  Matthew B. Blaschko,et al.  Perceptron Learning of SAT , 2012, NIPS.

[44]  Daniel Kudenko,et al.  Deep Multi-Agent Reinforcement Learning with Relevance Graphs , 2018, ArXiv.

[45]  Samy Bengio,et al.  Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[46]  Wei Wei,et al.  Reinforcement Learning Driven Heuristic Optimization , 2019, ArXiv.

[47]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[48]  Sanja Fidler,et al.  NerveNet: Learning Structured Policy with Graph Neural Networks , 2018, ICLR.