Approximate resilience, monotonicity, and the complexity of agnostic learning

A function $f$ is $d$-resilient if all its Fourier coefficients of degree at most $d$ are zero, i.e., $f$ is uncorrelated with all low-degree parities. We study the notion of $\mathit{approximate}$ $\mathit{resilience}$ of Boolean functions, where we say that $f$ is $\alpha$-approximately $d$-resilient if $f$ is $\alpha$-close to a $[-1,1]$-valued $d$-resilient function in $\ell_1$ distance. We show that approximate resilience essentially characterizes the complexity of agnostic learning of a concept class $C$ over the uniform distribution. Roughly speaking, if all functions in a class $C$ are far from being $d$-resilient then $C$ can be learned agnostically in time $n^{O(d)}$ and conversely, if $C$ contains a function close to being $d$-resilient then agnostic learning of $C$ in the statistical query (SQ) framework of Kearns has complexity of at least $n^{\Omega(d)}$. This characterization is based on the duality between $\ell_1$ approximation by degree-$d$ polynomials and approximate $d$-resilience that we establish. In particular, it implies that $\ell_1$ approximation by low-degree polynomials, known to be sufficient for agnostic learning over product distributions, is in fact necessary. Focusing on monotone Boolean functions, we exhibit the existence of near-optimal $\alpha$-approximately $\widetilde{\Omega}(\alpha\sqrt{n})$-resilient monotone functions for all $\alpha>0$. Prior to our work, it was conceivable even that every monotone function is $\Omega(1)$-far from any $1$-resilient function. Furthermore, we construct simple, explicit monotone functions based on ${\sf Tribes}$ and ${\sf CycleRun}$ that are close to highly resilient functions. Our constructions are based on a fairly general resilience analysis and amplification. These structural results, together with the characterization, imply nearly optimal lower bounds for agnostic learning of monotone juntas.

[1]  W. Beckner Inequalities in Fourier analysis , 1975 .

[2]  Pravesh Kothari,et al.  Submodular functions are noise stable , 2012, SODA.

[3]  Yishay Mansour,et al.  Weakly learning DNF and characterizing statistical query learning using Fourier analysis , 1994, STOC '94.

[4]  Santosh S. Vempala,et al.  University of Birmingham On the Complexity of Random Satisfiability Problems with Planted Solutions , 2018 .

[5]  Alexander A. Sherstov,et al.  Unconditional lower bounds for learning intersections of halfspaces , 2007, Machine Learning.

[6]  Alexander A. Sherstov,et al.  Lower Bounds for Agnostic Learning via Approximate Rank , 2010, computational complexity.

[7]  Michel Talagrand,et al.  How much are increasing sets positively correlated? , 1996, Comb..

[8]  Johan Håstad,et al.  Randomly Supported Independence and Resistance , 2011, SIAM J. Comput..

[9]  Vitaly Feldman,et al.  A Complete Characterization of Statistical Query Learning with Applications to Evolvability , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[10]  Nathan Linial,et al.  The Complexity of Learning Halfspaces using Generalized Linear Methods , 2012, COLT.

[11]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[12]  Ryan O'Donnell,et al.  KKL, Kruskal-Katona, and Monotone Nets , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[13]  Mark Schilling,et al.  The Longest Run of Heads , 1990 .

[14]  Nathan Srebro,et al.  Minimizing The Misclassification Error Rate Using a Surrogate Convex Loss , 2012, ICML.

[15]  Avi Wigderson,et al.  Pairwise Independence and Derandomization (Foundations and Trends(R) in Theoretical Computer Science) , 2006 .

[16]  Nathan Linial,et al.  The influence of variables on Boolean functions , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[17]  Ryan O'Donnell,et al.  Learning Monotone Decision Trees in Polynomial Time , 2007, SIAM J. Comput..

[18]  László Lovász,et al.  Algorithmic theory of numbers, graphs and convexity , 1986, CBMS-NSF regional conference series in applied mathematics.

[19]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[20]  R. Schapire,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[21]  M. Talagrand Isoperimetry, logarithmic sobolev inequalities on the discrete cube, and margulis' graph connectivity theorem , 1993 .

[22]  Avi Wigderson,et al.  Pairwise Independence and Derandomization , 2006, Found. Trends Theor. Comput. Sci..

[23]  Nader H. Bshouty,et al.  On the Fourier spectrum of monotone functions , 1995, STOC '95.

[24]  Yishay Mansour,et al.  An O(nlog log n) learning algorithm for DNF under the uniform distribution , 1992, COLT '92.

[25]  Elchanan Mossel,et al.  Approximation Resistant Predicates from Pairwise Independence , 2008, Computational Complexity Conference.

[26]  Pravesh Kothari,et al.  Learning Coverage Functions and Private Release of Marginals , 2014, COLT.

[27]  A. Bonami Étude des coefficients de Fourier des fonctions de $L^p(G)$ , 1970 .

[28]  Ryan O'Donnell,et al.  Learning monotone decision trees in polynomial time , 2006, 21st Annual IEEE Conference on Computational Complexity (CCC'06).

[29]  Rocco A. Servedio,et al.  On learning monotone DNF under product distributions , 2001, Inf. Comput..

[30]  Pravesh Kothari,et al.  Agnostic learning of disjunctions on symmetric distributions , 2015, J. Mach. Learn. Res..

[31]  V. Tikhomirov,et al.  DUALITY OF CONVEX FUNCTIONS AND EXTREMUM PROBLEMS , 1968 .

[32]  Ryan O'Donnell,et al.  Learning functions of k relevant variables , 2004, J. Comput. Syst. Sci..

[33]  John Langford,et al.  On learning monotone Boolean functions , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[34]  Gregory Valiant,et al.  Finding Correlations in Subquadratic Time, with Applications to Learning Parities and Juntas , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[35]  Oded Goldreich,et al.  The bit extraction problem or t-resilient functions , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[36]  Balázs Szörényi Characterizing Statistical Query Learning: Simplified Notions and Proofs , 2009, ALT.

[37]  Alexander A. Sherstov The Pattern Matrix Method , 2009, SIAM J. Comput..

[38]  Rocco A. Servedio,et al.  Learning large-margin halfspaces with more malicious noise , 2011, NIPS.

[39]  Yishay Mansour,et al.  An O(n^(log log n)) Learning Algorithm for DNT under the Uniform Distribution , 1995, J. Comput. Syst. Sci..

[40]  R. O'Donnell,et al.  Computational applications of noise sensitivity , 2003 .

[41]  Rocco A. Servedio,et al.  Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[42]  David Witmer,et al.  Goldreich's PRG: Evidence for Near-Optimal Polynomial Stretch , 2014, 2014 IEEE 29th Conference on Computational Complexity (CCC).

[43]  Prasad Raghavendra,et al.  Agnostic Learning of Monomials by Halfspaces Is Hard , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[44]  Thomas Siegenthaler,et al.  Correlation-immunity of nonlinear combining functions for cryptographic applications , 1984, IEEE Trans. Inf. Theory.

[45]  Alexander A. Sherstov,et al.  Cryptographic Hardness for Learning Intersections of Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[46]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[47]  Rocco A. Servedio,et al.  Lower Bounds and Hardness Amplification for Learning Shallow Monotone Formulas , 2011, COLT.

[48]  Hans Ulrich Simon A Characterization of Strong Learnability in the Statistical Query Model , 2007, STACS.

[49]  Elchanan Mossel,et al.  On the noise sensitivity of monotone functions , 2003, Random Struct. Algorithms.

[50]  Ryan O'Donnell,et al.  KKL, Kruskal-Katona, and Monotone Nets , 2013, SIAM J. Comput..

[51]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[52]  Ke Yang,et al.  New lower bounds for statistical query learning , 2002, J. Comput. Syst. Sci..

[53]  Vitaly Feldman,et al.  On Agnostic Learning of Parities, Monomials, and Halfspaces , 2009, SIAM J. Comput..