C C ] 1 7 M ay 2 01 5 How to refute a random CSP

Let P be a nontrivial k-ary predicate over a finite alphabet. Consider a random CSP(P ) instance I over n variables with m constraints, each being P applied to k random literals. When m ≫ n the instance I will be unsatisfiable with high probability, and the natural associated algorithmic task is to find a refutation of I — i.e., a certificate of unsatisfiability. When P is the 3-ary Boolean OR predicate, this is the well studied problem of refuting random 3-SAT formulas; in this case, an efficient algorithm is known only when m ≫ n. Understanding the density required for average-case refutation of other predicates is of importance for various areas of complexity, including cryptography, proof complexity, and learning theory. The main previously-known result is that for a general Boolean k-ary predicate P , having m ≫ n random constraints suffices for efficient refutation. In this work we give a general criterion for arbitrary k-ary predicates, one that often yields efficient refutation algorithms at much lower densities. Specifically, if P fails to support a t-wise independent (uniform) probability distribution (2 ≤ t ≤ k), then there is an efficient algorithm that refutes random CSP(P ) instances I with high probability, provided m ≫ n. Indeed, our algorithm will “somewhat strongly” refute I, certifying Opt(I) ≤ 1 − Ωk(1); if t = k then we furthermore get the strongest possible refutation, certifying Opt(I) ≤ E[P ]+ o(1). This last result is new even in the context of random k-SAT. Regarding the optimality of our m ≫ n density requirement, prior work on SDP hierarchies has given some evidence that efficient refutation of random CSP(P ) may be impossible when m≪ n. Thus there is an indication our algorithm’s dependence on m is optimal for every P , at least in the context of SDP hierarchies. Along these lines, we show that our refutation algorithm can be carried out by the O(1)-round SOS SDP hierarchy. Finally, as an application of our result, we falsify the “SRCSP assumptions” used to show various hardness-of-learning results in the recent (STOC 2014) work of Daniely, Linial, and Shalev–Shwartz. Department of Computer Science, Carnegie Mellon. {srallen,odonnell,dwitmer}@cs.cmu.edu. Supported by NSF grants CCF-0747250 and CCF-1116594. Some of this work performed while the second-named author was at the Boğaziçi University Computer Engineering Department, supported by Marie Curie International Incoming Fellowship project number 626373. The first and third named authors were partially supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-1252522. 1 On refutation of random CSPs Constraint satisfaction problems (CSPs) play a major role in computer science. There is a vast theory [BJK05] of how algebraic properties of a CSP predicate affect its worst-case satisfiability complexity; there is a similarly vast theory [Rag09] of worst-case approximability of CSPs. Finally, there is a rich range of research — from the fields of computer science, mathematics, and physics — on the average-case complexity of random CSPs; see [Ach09] for a survey just of random k-SAT. This paper is concerned with random CSPs, and in particular the problem of efficiently refuting satisfiability for random instances. This is a well-studied algorithmic task with connections to, e.g., proof complexity [BB02], inapproximability [Fei02], SAT-solvers [SAT], cryptography [ABW10], learning theory [DLSS14], statistical physics [CLP02], and complexity theory [BKS13]. Historically, random CSPs are probably best studied in the case of k-SAT, k ≥ 3. The model here involves choosing a CNF formula I over n variables by drawing m clauses (ORs of k literals) independently and uniformly at random. (The precise details of the random model are inessential; see Section 3.1 for more information.) This is one of the best known efficient ways of generating hard-seeming instances of NP-complete and coNP-complete problems. The computational hardness depends crucially on the density, α = m/n. For each k there is (conjecturally) a constant critical density αk such that I is satisfiable with high probability when α < αk, and I is unsatisfiable with high probability when α > αk. (Here and throughout, “with high probability (whp)” means with probability 1 − o(1) as n → ∞.) This phenomenon occurs for all nontrivial random CSPs; in the case of k-SAT it’s been rigorously proven [DSS15] for sufficiently large k. There is a natural algorithmic task associated with the two regimes. When α < αk one wants to find a satisfying assignment for I. When α > αk one wants to refute I; i.e., find a certificate of unsatisfiability. Most heuristic SAT-solvers use DPLL-based algorithms; on unsatisfiable instances, they produce certificates that can be viewed as refutations within the Resolution proof system. More generally, a refutation algorithm for density α is any algorithm that: a) outputs “unsatisfiable” or “fail”; b) never incorrectly outputs “unsatisfiable”; c) outputs “fail” with low probability (i.e., probability o(1)).1 Empirical work suggests that as α increases towards αk, finding satisfying assignments becomes more difficult; and conversely, as α increases beyond αk, finding certificates of unsatisfiability gradually becomes easier. A seminal paper of Chvátal and Szemerédi [CS88] showed that for any sufficiently large integer c (depending on k), a random k-SAT instance with m = cn requires Resolution refutations of size 2Ω(n) (whp). On the other hand, Fu [Fu96] showed that polynomial-size Resolution refutations exist (whp) once m ≥ O(nk−1); Beame et al. [BKPS99] subsequently showed that such proofs could be found efficiently.2 A breakthrough came in 2001, when Goerdt and Krivelevich [GK01] abandoned combinatorial refutations for spectral ones, showing that random k-SAT instances can be efficiently refuted when m ≥ Õ(n⌈k/2⌉). Soon thereafter, Friedman and Goerdt [FG01] (see also [FGK05]) showed that for 3-SAT, efficient spectral refutations exist once m ≥ n3/2+ǫ (for any ǫ > 0). These densities for k-SAT — around n3/2 for 3-SAT and n⌈k/2⌉ in general — have not been fundamentally improved upon in the last 14 years.3 (See Table 1 for a more detailed history of results in this We caution the reader that in this paper we do not consider the related, but distinct, scenario of distinguishing planted random instances from truly random ones. In this paper we use the following not-fully-standard terminology: A statement of the form "If f(n) ≥ O(g(n)) then X" means that there exists a certain function h(n), with h(n) being O(g(n)), such that the statement "If f(n) ≥ h(n) then X" is true. We also use Õ(f(n)) to denote O(f(n) · polylog(f(n)), and Ok(f(n)) to denote that the hidden constant has a dependence on k (most often of the form 2).

