论文信息 - Tight Bounds on ℓ1 Approximation and Learning of Self-Bounding Functions

Tight Bounds on ℓ1 Approximation and Learning of Self-Bounding Functions

We study the complexity of learning and approximation of self-bounding functions over the uniform distribution on the Boolean hypercube ${0,1}^n$. Informally, a function $f:{0,1}^n \rightarrow \mathbb{R}$ is self-bounding if for every $x \in {0,1}^n$, $f(x)$ upper bounds the sum of all the $n$ marginal decreases in the value of the function at $x$. Self-bounding functions include such well-known classes of functions as submodular and fractionally-subadditive (XOS) functions. They were introduced by Boucheron et al. (2000) in the context of concentration of measure inequalities. Our main result is a nearly tight $\ell_1$-approximation of self-bounding functions by low-degree juntas. Specifically, all self-bounding functions can be $\epsilon$-approximated in $\ell_1$ by a polynomial of degree $\tilde{O}(1/\epsilon)$ over $2^{\tilde{O}(1/\epsilon)}$ variables. We show that both the degree and junta-size are optimal up to logarithmic terms. Previous techniques considered stronger $\ell_2$ approximation and proved nearly tight bounds of $\Theta(1/\epsilon^{2})$ on the degree and $2^{\Theta(1/\epsilon^2)}$ on the number of variables. Our bounds rely on the analysis of noise stability of self-bounding functions together with a stronger connection between noise stability and $\ell_1$ approximation by low-degree polynomials. This technique can also be used to get tighter bounds on $\ell_1$ approximation by low-degree polynomials and faster learning algorithm for halfspaces. These results lead to improved and in several cases almost tight bounds for PAC and agnostic learning of self-bounding functions relative to the uniform distribution. In particular, assuming hardness of learning juntas, we show that PAC and agnostic learning of self-bounding functions have complexity of $n^{\tilde{\Theta}(1/\epsilon)}$.

Pravesh Kothari | Jan Vondrák | Vitaly Feldman

[1] Noam Nisan,et al. CREW PRAMS and decision trees , 1989, STOC '89.

[2] Nathan Linial,et al. The influence of variables on Boolean functions , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[3] David P. Williamson,et al. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[4] Adam Tauman Kalai,et al. Agnostically learning decision trees , 2008, STOC.

[5] Tim Roughgarden,et al. Sketching valuation functions , 2012, SODA.

[6] G. Nemhauser,et al. Exceptional Paper—Location of Bank Accounts to Optimize Float: An Analytic Study of Exact and Approximate Algorithms , 1977 .

[7] Li-Yang Tan,et al. Approximate resilience, monotonicity, and the complexity of agnostic learning , 2014, SODA.

[8] Robert E. Schapire,et al. Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[9] Jan Vondrák,et al. A note on concentration of submodular functions , 2010, ArXiv.

[10] Rocco A. Servedio,et al. Bounded Independence Fools Halfspaces , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[11] Yishay Mansour,et al. Weakly learning DNF and characterizing statistical query learning using Fourier analysis , 1994, STOC '94.

[12] S. Boucheron,et al. A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[13] Gregory Valiant,et al. Finding Correlations in Subquadratic Time, with Applications to Learning Parities and Juntas , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[14] Daniel Lehmann,et al. Combinatorial auctions with decreasing marginal utilities , 2001, EC '01.

[15] Maria-Florina Balcan,et al. Learning Valuation Functions , 2011, COLT.

[16] Pat Langley,et al. Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[17] Satoru Iwata,et al. A combinatorial strongly polynomial algorithm for minimizing submodular functions , 2001, JACM.

[18] Pravesh Kothari,et al. Learning Coverage Functions , 2013, ArXiv.

[19] Pravesh Kothari,et al. Submodular functions are noise stable , 2012, SODA.

[20] Uriel Feige,et al. On maximizing welfare when utility functions are subadditive , 2006, STOC '06.

[21] Colin McDiarmid,et al. Concentration for self-bounding functions and an inequality of Talagrand , 2006 .

[22] Ryan O'Donnell,et al. Learning functions of k relevant variables , 2004, J. Comput. Syst. Sci..

[23] Linda Sellie,et al. Toward efficient agnostic learning , 1992, COLT '92.

[24] Maria-Florina Balcan,et al. Submodular Functions: Learnability, Structure, and Optimization , 2010, SIAM J. Comput..

[25] Maurice Queyranne,et al. A combinatorial algorithm for minimizing symmetric submodular functions , 1995, SODA '95.

[26] Jan Vondrák,et al. Tight Bounds on Low-Degree Spectral Concentration of Submodular and XOS Functions , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[27] Eric Blais,et al. Testing Submodularity and Other Properties of Valuation Functions , 2017, ITCS.

[28] Ambuj Tewari,et al. On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.

[29] David Haussler,et al. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[30] Pravesh Kothari,et al. Representation, Approximation and Learning of Submodular Functions Using Low-rank Decision Trees , 2013, COLT.

[31] Pravesh Kothari,et al. Learning Coverage Functions and Private Release of Marginals , 2014, COLT.

[32] Vahab Mirrokni,et al. Maximizing Non-Monotone Submodular Functions , 2007, FOCS 2007.

[33] Ryan O'Donnell,et al. Analysis of Boolean Functions , 2014, ArXiv.

[34] Sofya Raskhodnikova,et al. Learning pseudo-Boolean k-DNF and submodular functions , 2013, SODA.

[35] F. Dunstan. MATROIDS AND SUBMODULAR FUNCTIONS , 1976 .

[36] Vahab S. Mirrokni,et al. Approximating submodular functions everywhere , 2009, SODA.

[37] Aaron Roth,et al. Privately releasing conjunctions and the statistical query barrier , 2010, STOC '11.

[38] Jan Vondrák,et al. Optimal Bounds on Approximation of Submodular and XOS Functions by Juntas , 2016, SIAM J. Comput..

[39] Nathan Linial,et al. Collective coin flipping, robust voting schemes and minima of Banzhaf values , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[40] Ehud Friedgut,et al. Boolean Functions With Low Average Sensitivity Depend On Few Coordinates , 1998, Comb..

[41] Rocco A. Servedio,et al. Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[42] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.