Learning to Select Branching Rules in the DPLL Procedure for Satisfiability

Abstract The DPLL procedure is the most popular complete satisfiability (SAT) solver. While its worst case complexity is exponential, the actual running time is greatly affected by the ordering of branch variables during the search. Several branching rules have been proposed, but none is the best in all cases. This work investigates the use of automated methods for choosing the most appropriate branching rule at each node in the search tree. We consider a reinforcement-learning approach where a value function, which predicts the performance of each branching rule in each case, is learned through trial runs on a typical problem set of the target class of SAT problems. Our results indicate that, provided sufficient training on a given class, the resulting strategy performs as well as (and, in some cases, better than) the best branching rule for that class. Research supported in part by NSF grant IRI-9702576. The first author was also partially supported by the Lilian-Voudouri Foundation in Greece. The authors gratefully acknowledge the influence of Don Loveland, Ron Parr, and Henry Kautz in helping to shape this work.