Learning to Generate Industrial SAT Instances

In this paper, we present SATGEN, the first implicit model that generates Boolean Satisfiability formulas which resemble instances that arise in real-world industrial settings. Our approach uses unsupervised machine learning techniques to create new formulas by mimicking the structural properties of a given input formula Φ. We proceed in two phases: first, we construct the Literal Incidence Graph (LIG) of Φ. This is used by a Generative Adversarial Network to generate new LIGs that exhibit graph-theoretic properties similar to those of the LIG of Φ. In the second phase, we extract a formula Φ′ whose LIG would correspond to the generated graph. Generating such a formula is equivalent to finding a minimal clique edge cover of the given graph, which we tackle efficiently using a greedy hill-climbing algorithm. We verify experimentally that our approach generates formulas that closely resemble a given real-world SAT instance, as measured by a range of different metrics.