Tight Bounds on ℓ1 Approximation and Learning of Self-Bounding Functions

We study the complexity of learning and approximation of self-bounding functions over the uniform distribution on the Boolean hypercube ${0,1}^n$. Informally, a function $f:{0,1}^n \rightarrow \mathbb{R}$ is self-bounding if for every $x \in {0,1}^n$, $f(x)$ upper bounds the sum of all the $n$ marginal decreases in the value of the function at $x$. Self-bounding functions include such well-known classes of functions as submodular and fractionally-subadditive (XOS) functions. They were introduced by Boucheron et al. (2000) in the context of concentration of measure inequalities. Our main result is a nearly tight $\ell_1$-approximation of self-bounding functions by low-degree juntas. Specifically, all self-bounding functions can be $\epsilon$-approximated in $\ell_1$ by a polynomial of degree $\tilde{O}(1/\epsilon)$ over $2^{\tilde{O}(1/\epsilon)}$ variables. We show that both the degree and junta-size are optimal up to logarithmic terms. Previous techniques considered stronger $\ell_2$ approximation and proved nearly tight bounds of $\Theta(1/\epsilon^{2})$ on the degree and $2^{\Theta(1/\epsilon^2)}$ on the number of variables. Our bounds rely on the analysis of noise stability of self-bounding functions together with a stronger connection between noise stability and $\ell_1$ approximation by low-degree polynomials. This technique can also be used to get tighter bounds on $\ell_1$ approximation by low-degree polynomials and faster learning algorithm for halfspaces. These results lead to improved and in several cases almost tight bounds for PAC and agnostic learning of self-bounding functions relative to the uniform distribution. In particular, assuming hardness of learning juntas, we show that PAC and agnostic learning of self-bounding functions have complexity of $n^{\tilde{\Theta}(1/\epsilon)}$.

[1]  Noam Nisan,et al.  CREW PRAMS and decision trees , 1989, STOC '89.

[2]  Nathan Linial,et al.  The influence of variables on Boolean functions , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[3]  David P. Williamson,et al.  Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[4]  Adam Tauman Kalai,et al.  Agnostically learning decision trees , 2008, STOC.

[5]  Tim Roughgarden,et al.  Sketching valuation functions , 2012, SODA.

[6]  G. Nemhauser,et al.  Exceptional Paper—Location of Bank Accounts to Optimize Float: An Analytic Study of Exact and Approximate Algorithms , 1977 .

[7]  Li-Yang Tan,et al.  Approximate resilience, monotonicity, and the complexity of agnostic learning , 2014, SODA.

[8]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[9]  Jan Vondrák,et al.  A note on concentration of submodular functions , 2010, ArXiv.

[10]  Rocco A. Servedio,et al.  Bounded Independence Fools Halfspaces , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[11]  Yishay Mansour,et al.  Weakly learning DNF and characterizing statistical query learning using Fourier analysis , 1994, STOC '94.

[12]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[13]  Gregory Valiant,et al.  Finding Correlations in Subquadratic Time, with Applications to Learning Parities and Juntas , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[14]  Daniel Lehmann,et al.  Combinatorial auctions with decreasing marginal utilities , 2001, EC '01.

[15]  Maria-Florina Balcan,et al.  Learning Valuation Functions , 2011, COLT.

[16]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[17]  Satoru Iwata,et al.  A combinatorial strongly polynomial algorithm for minimizing submodular functions , 2001, JACM.

[18]  Pravesh Kothari,et al.  Learning Coverage Functions , 2013, ArXiv.

[19]  Pravesh Kothari,et al.  Submodular functions are noise stable , 2012, SODA.

[20]  Uriel Feige,et al.  On maximizing welfare when utility functions are subadditive , 2006, STOC '06.

[21]  Colin McDiarmid,et al.  Concentration for self-bounding functions and an inequality of Talagrand , 2006 .

[22]  Ryan O'Donnell,et al.  Learning functions of k relevant variables , 2004, J. Comput. Syst. Sci..

[23]  Linda Sellie,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[24]  Maria-Florina Balcan,et al.  Submodular Functions: Learnability, Structure, and Optimization , 2010, SIAM J. Comput..

[25]  Maurice Queyranne,et al.  A combinatorial algorithm for minimizing symmetric submodular functions , 1995, SODA '95.

[26]  Jan Vondrák,et al.  Tight Bounds on Low-Degree Spectral Concentration of Submodular and XOS Functions , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[27]  Eric Blais,et al.  Testing Submodularity and Other Properties of Valuation Functions , 2017, ITCS.

[28]  Ambuj Tewari,et al.  On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.

[29]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[30]  Pravesh Kothari,et al.  Representation, Approximation and Learning of Submodular Functions Using Low-rank Decision Trees , 2013, COLT.

[31]  Pravesh Kothari,et al.  Learning Coverage Functions and Private Release of Marginals , 2014, COLT.

[32]  Vahab Mirrokni,et al.  Maximizing Non-Monotone Submodular Functions , 2007, FOCS 2007.

[33]  Ryan O'Donnell,et al.  Analysis of Boolean Functions , 2014, ArXiv.

[34]  Sofya Raskhodnikova,et al.  Learning pseudo-Boolean k-DNF and submodular functions , 2013, SODA.

[35]  F. Dunstan MATROIDS AND SUBMODULAR FUNCTIONS , 1976 .

[36]  Vahab S. Mirrokni,et al.  Approximating submodular functions everywhere , 2009, SODA.

[37]  Aaron Roth,et al.  Privately releasing conjunctions and the statistical query barrier , 2010, STOC '11.

[38]  Jan Vondrák,et al.  Optimal Bounds on Approximation of Submodular and XOS Functions by Juntas , 2016, SIAM J. Comput..

[39]  Nathan Linial,et al.  Collective coin flipping, robust voting schemes and minima of Banzhaf values , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[40]  Ehud Friedgut,et al.  Boolean Functions With Low Average Sensitivity Depend On Few Coordinates , 1998, Comb..

[41]  Rocco A. Servedio,et al.  Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[42]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.