Beyond the structure of SAT formulas

Nowadays, many real-world problems are encoded into SAT instances and efficiently solved by modern SAT solvers. These solvers, usually known as Conflict-Driven Clause Learning (CDCL) SAT solvers, include a variety of sophisticated techniques, such as clause learning, lazy data structures, conflict-based adaptive branching heuristics, or random restarts, among others. However, the reasons of their efficiency in solving real-world, or industrial, SAT instances are still unknown. The common wisdom in the SAT community is that these technique exploit some hidden structure of real-world problems.In this thesis, we characterize some important features of the underlying structure of industrial SAT instances. Namely, they are the community structure and the self-similar structure. We observe that most industrial SAT formulas, viewed as graphs, have these two properties. This means that (i) in a graph with a clear community structure, i.e. having high modularity, we can find a partition of its nodes into communities such that most edges connect nodes of the same community; and (ii) in a graph with a self-similar pattern, i.e. being fractal, its shape is kept after re-scalings, i.e., grouping sets of nodes into a single node. We also analyze how these structures are affected by the effects of CDCL techniques during the search.Using the previous structural studies, we propose three applications. First, we face the problem of generating pseudo-industrial random SAT instances using the notion of modularity. Our model generates instances similar to (classical) random SAT formulas when the modularity is low, but when this value is high, our model is also adequate to model realistic pseudo-industrial problems. Second, we propose a method based on the community structure of the instance to detect relevant learnt clauses. Our technique augments the original instance with this set of relevant clauses, and this results into an overall improvement of the efficiency of several state-of-the-art CDCL SAT solvers. Finally, we analyze the classification of industrial SAT instances into families using the previously analyzed structure features, and we compare them to other classifiers commonly used in portfolio SAT approaches.In summary, this dissertation extends the understandings of the structure of SAT instances, with the aim of better explaining the success of CDCL techniques and possibly improve them, and propose a number of applications based on this analysis of the underlying structure of SAT formulas.

[1]  Predrag Janicic,et al.  Instance-Based Selection of Policies for SAT Solvers , 2009, SAT.

[2]  Niklas Sörensson,et al.  An Extensible SAT-solver , 2003, SAT.

[3]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[4]  Mikolás Janota,et al.  Exploiting Resolution-Based Representations for MaxSAT Solving , 2015, SAT.

[5]  Armin Biere Lingeling Essentials, A Tutorial on Design and Implementation Aspects of the the SAT Solver Lingeling , 2014, POS@SAT.

[6]  Hector J. Levesque,et al.  Hard and Easy Distributions of SAT Problems , 1992, AAAI.

[7]  Oliver Kullmann,et al.  Fundaments of Branching Heuristics , 2021, Handbook of Satisfiability.

[8]  Christian Bessiere,et al.  Statistical Regimes Across Constrainedness Regions , 2004, Constraints.

[9]  Chiara Orsini,et al.  Hyperbolic graph generator , 2015, Comput. Phys. Commun..

[10]  Toby Walsh,et al.  Morphing: Combining Structure and Randomness , 1999, AAAI/IAAI.

[11]  Risto Miikkulainen,et al.  Latent class models for algorithm portfolio methods , 2010, AAAI 2010.

[12]  Marián Boguñá,et al.  Popularity versus similarity in growing networks , 2011, Nature.

[13]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[14]  Ilkka Niemelä,et al.  The effect of structural branching on the efficiency of clause learning SAT solving: An experimental study , 2008, J. Algorithms.

[15]  Sharad Malik,et al.  Chaff: engineering an efficient SAT solver , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[16]  Kwang-Ting Cheng,et al.  A circuit SAT solver with signal correlation guided learning , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[17]  Norbert Manthey,et al.  Riss Solver Framework v 5 . 05 , 2015 .

[18]  Vasco M. Manquinho,et al.  Community-Based Partitioning for MaxSAT Solving , 2013, SAT.

[19]  Bart Selman,et al.  The state of SAT , 2007, Discret. Appl. Math..

[20]  Gilles Audemard,et al.  Predicting Learnt Clauses Quality in Modern SAT Solvers , 2009, IJCAI.

[21]  Hector J. Levesque,et al.  Generating Hard Satisfiability Problems , 1996, Artif. Intell..

[22]  Kevin Leyton-Brown,et al.  : The Design and Analysis of an Algorithm Portfolio for SAT , 2007, CP.

[23]  Andrew Slater,et al.  Modelling More Realistic SAT Problems , 2002, Australian Joint Conference on Artificial Intelligence.

[24]  Bart Selman,et al.  Boosting Combinatorial Search Through Randomization , 1998, AAAI/IAAI.

[25]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[26]  Donald W. Loveland,et al.  A machine program for theorem-proving , 2011, CACM.

[27]  Amin Coja-Oghlan,et al.  The asymptotic k-SAT threshold , 2014, STOC.

[28]  Mohammed J. Zaki,et al.  Is There a Best Quality Metric for Graph Clusters? , 2011, ECML/PKDD.

[29]  Bart Selman,et al.  Ten Challenges in Propositional Reasoning and Search , 1997, IJCAI.

[30]  S. Havlin,et al.  How to calculate the fractal dimension of a complex network: the box covering algorithm , 2007, cond-mat/0701216.

[31]  Walter Willinger,et al.  Towards a Theory of Scale-Free Graphs: Definition, Properties, and Implications , 2005, Internet Math..

[32]  Yuri Malitsky,et al.  Algorithm Selection and Scheduling , 2011, CP.

[33]  Armin Biere,et al.  Effective Preprocessing in SAT Through Variable and Clause Elimination , 2005, SAT.

[34]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  J. P. Marques,et al.  GRASP : A Search Algorithm for Propositional Satisfiability , 1999 .

[36]  Riccardo Zecchina,et al.  Survey propagation: An algorithm for satisfiability , 2002, Random Struct. Algorithms.

[37]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[38]  Tad Hogg,et al.  Refining the Phase Transition in Combinatorial Search , 1996, Artif. Intell..

[39]  Michael Kaufmann,et al.  Creating Industrial-Like SAT Instances by Clustering and Reconstruction - (Poster Presentation) , 2012, SAT.

[40]  L. da F. Costa,et al.  Characterization of complex networks: A survey of measurements , 2005, cond-mat/0505185.

[41]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[42]  Charles Elkan,et al.  Quadratic Programming Feature Selection , 2010, J. Mach. Learn. Res..

[43]  Tomohiro Sonobe,et al.  Community Branching for Parallel Portfolio SAT Solvers , 2014, SAT.

[44]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[45]  George Katsirelos,et al.  Eigenvector Centrality in Industrial SAT Instances , 2012, CP.

[46]  Bart Selman,et al.  Algorithm portfolios , 2001, Artif. Intell..

[47]  Timothy P. Hart,et al.  Resolution Graphs , 1970, Artif. Intell..

[48]  Maria Luisa Bonet,et al.  The Fractal Dimension of SAT Formulas , 2013, IJCAR.

[49]  Maria Luisa Bonet,et al.  Towards Industrial-Like Random SAT Instances , 2009, IJCAI.

[50]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[51]  Ulrik Brandes,et al.  On Modularity Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[52]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[53]  Maria Luisa Bonet,et al.  On the Classification of Industrial SAT Families , 2015, CCIA.

[54]  Inês Lynce,et al.  Conflict-Driven Clause Learning SAT Solvers , 2009, Handbook of Satisfiability.

[56]  Yong Wang,et al.  Using Model Trees for Classification , 1998, Machine Learning.

[57]  Armin Biere,et al.  Inprocessing Rules , 2012, IJCAR.

[58]  Olivier Roussel,et al.  The International SAT Solver Competitions , 2012, AI Mag..

[59]  Adnan Darwiche,et al.  A Lightweight Component Caching Scheme for Satisfiability Solvers , 2007, SAT.

[60]  My T. Thai,et al.  Finding Community Structure with Performance Guarantees in Scale-Free Networks , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[61]  M. Mézard,et al.  Threshold values of random K-SAT from the cavity method , 2006 .

[62]  C. Sparrow The Fractal Geometry of Nature , 1984 .

[63]  Yuri Malitsky,et al.  Non-Model-Based Algorithm Portfolios for SAT , 2011, SAT.

[64]  Laurent Simon,et al.  Post Mortem Analysis of SAT Solver Proofs , 2014, POS@SAT.

[65]  Maria Luisa Bonet,et al.  On the Structure of Industrial SAT Instances , 2009, CP.

[66]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[67]  John G. Cleary,et al.  K*: An Instance-based Learner Using and Entropic Distance Measure , 1995, ICML.

[68]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[69]  Armin Biere LINGELING and Friends Entering the SAT Challenge 2012 , 2012 .

[70]  Bart Selman,et al.  Ten Challenges Redux: Recent Progress in Propositional Reasoning and Search , 2003, CP.

[71]  Bart Selman,et al.  Regular Random k-SAT: Properties of Balanced Formulas , 2005, Journal of Automated Reasoning.

[72]  Hilary Putnam,et al.  A Computing Procedure for Quantification Theory , 1960, JACM.

[73]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[74]  Maria Luisa Bonet,et al.  Structure features for SAT instances classification , 2017, J. Appl. Log..

[75]  Armin Biere,et al.  Decomposing SAT Problems into Connected Components , 2006, J. Satisf. Boolean Model. Comput..

[76]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[77]  Carlos Ansótegui,et al.  The Community Structure of SAT Formulas , 2012, SAT.

[78]  John R. Rice,et al.  The Algorithm Selection Problem , 1976, Adv. Comput..

[79]  Kevin Leyton-Brown,et al.  SATzilla: Portfolio-based Algorithm Selection for SAT , 2008, J. Artif. Intell. Res..

[80]  Yuri Malitsky,et al.  ISAC - Instance-Specific Algorithm Configuration , 2010, ECAI.

[81]  Toby Walsh,et al.  Search on High Degree Graphs , 2001, IJCAI.

[82]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[83]  Jarrod A. Roy,et al.  Restoring Circuit Structure from SAT Instances , 2004 .

[84]  Bart Selman,et al.  Backdoors To Typical Case Complexity , 2003, IJCAI.

[85]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[86]  Mikko Koivisto,et al.  Finding Efficient Circuits for Ensemble Computation , 2012, SAT.

[87]  Rina Dechter,et al.  Constraint Processing , 1995, Lecture Notes in Computer Science.

[88]  Paul Erdös,et al.  On random graphs, I , 1959 .

[89]  Bart Selman,et al.  Problem Structure in the Presence of Perturbations , 1997, AAAI/IAAI.

[90]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[91]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[92]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[93]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[94]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[95]  Jordi Levy,et al.  Generating SAT instances with community structure , 2016, Artif. Intell..

[96]  Predrag Janicic,et al.  Simple algorithm portfolio for SAT , 2011, Artificial Intelligence Review.

[97]  Bart Selman,et al.  Generating Satisfiable Problem Instances , 2000, AAAI/IAAI.

[98]  Krzysztof Czarnecki,et al.  SATGraf: Visualizing the Evolution of SAT Formula Structure in Solvers , 2015, SAT.

[99]  Jingchao Chen A Bit-Encoding Phase Selection Strategy for Satisfiability Solvers , 2014, TAMC.

[100]  Krzysztof Czarnecki,et al.  Understanding VSIDS Branching Heuristics in Conflict-Driven Clause-Learning SAT Solvers , 2015, Haifa Verification Conference.

[101]  Sebastian Fischmeister,et al.  Impact of Community Structure on SAT Solver Performance , 2014, SAT.

[102]  Jesús Giráldez-Cru,et al.  A Modularity-Based Random SAT Instances Generator , 2015, IJCAI.

[103]  Joao Marques-Silva,et al.  Empirical Study of the Anatomy of Modern Sat Solvers , 2011, SAT.

[104]  M. Barber Modularity and community detection in bipartite networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[105]  Carlos Ansótegui,et al.  Using Community Structure to Detect Relevant Learnt Clauses , 2015, SAT.

[106]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[107]  Robert Ganian,et al.  Community Structure Inspired Algorithms for SAT and #SAT , 2015, SAT.

[108]  Toby Walsh,et al.  Search in a Small World , 1999, IJCAI.