Towards Industrial-Like Random SAT Instances

We focus on the random generation of SAT instances that have computational properties that are similar to real-world instances. It is known that industrial instances, even with a great number of variables, can be solved by a clever solver in a reasonable amount of time. This is not possible, in general, with classical randomly generated instances. We provide different generation models of SAT instances, extending the uniform and regular 3-CNF models. They are based on the use of non-uniform probability distributions to select variables. Our last model also uses a mechanism to produce clauses of different lengths as in industrial instances. We show the existence of the phase transition phenomena for our models and we study the hardness of the generated instances as a function of the parameters of the probability distributions. We prove that, with these parameters we can adjust the difficulty of the problems in the phase transition point. We measure hardness in terms of the performance of different solvers. We show how these models will allow us to generate random instances similar to industrial instances, of interest for testing purposes.

[1]  Gilles Dequen,et al.  A backbone-search heuristic for efficient solving of hard 3-SAT formulae , 2001, IJCAI.

[2]  Rina Dechter,et al.  Constraint Processing , 1995, Lecture Notes in Computer Science.

[3]  Roberto J. Bayardo,et al.  Using CSP Look-Back Techniques to Solve Exceptionally Hard SAT Instances , 1996, CP.

[4]  Claudio Sartori,et al.  Incremental maintenance of multi-source views , 2001, ADC.

[5]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[6]  Niklas Sörensson,et al.  An Extensible SAT-solver , 2003, SAT.

[7]  R. L. Stens,et al.  Sampling theory in Fourier and signal analysis : advanced topics , 1999 .

[8]  James C. French,et al.  Clustering large datasets in arbitrary metric spaces , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[9]  Abdul Sattar,et al.  Advances in Local Search for Satisfiability , 2007, Australian Conference on Artificial Intelligence.

[10]  Bart Selman,et al.  Ten Challenges in Propositional Reasoning and Search , 1997, IJCAI.

[11]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[12]  Philip K. Chan,et al.  Advances in Distributed and Parallel Knowledge Discovery , 2000 .

[13]  Hans van Maaren,et al.  Whose side are you on? Finding solutions in a biased search-tree , 2008, J. Satisf. Boolean Model. Comput..

[14]  Maria Luisa Bonet,et al.  Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) , 2022 .

[15]  Hans-Peter Kriegel,et al.  Incremental Clustering for Mining in a Data Warehousing Environment , 1998, VLDB.

[16]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[17]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[18]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[19]  Andreas Rudolph,et al.  Techniques of Cluster Algorithms in Data Mining , 2002, Data Mining and Knowledge Discovery.

[20]  Bart Selman,et al.  Ten Challenges Redux: Recent Progress in Propositional Reasoning and Search , 2003, CP.

[21]  Bart Selman,et al.  Regular Random k-SAT: Properties of Balanced Formulas , 2005, Journal of Automated Reasoning.

[22]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[23]  Hillol Kargupta,et al.  Distributed Clustering Using Collective Principal Component Analysis , 2001, Knowledge and Information Systems.

[24]  Bart Selman,et al.  Backdoors To Typical Case Complexity , 2003, IJCAI.

[25]  Hillol Kargupta,et al.  Collective, Hierarchical Clustering from Distributed, Heterogeneous Data , 1999, Large-Scale Parallel Data Mining.

[26]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[27]  Chu Min Li,et al.  Look-Ahead Versus Look-Back for Satisfiability Problems , 1997, CP.

[28]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[29]  Erich Schikuta,et al.  Grid-clustering: an efficient hierarchical clustering method for very large data sets , 1996, Proceedings of 13th International Conference on Pattern Recognition.