论文信息 - Representation, Approximation and Learning of Submodular Functions Using Low-rank Decision Trees

Representation, Approximation and Learning of Submodular Functions Using Low-rank Decision Trees

We study the complexity of approximate representation and learning of submodular functions over the uniform distribution on the Boolean hypercube $\{0,1\}^n$. Our main result is the following structural theorem: any submodular function is $\epsilon$-close in $\ell_2$ to a real-valued decision tree (DT) of depth $O(1/\epsilon^2)$. This immediately implies that any submodular function is $\epsilon$-close to a function of at most $2^{O(1/\epsilon^2)}$ variables and has a spectral $\ell_1$ norm of $2^{O(1/\epsilon^2)}$. It also implies the closest previous result that states that submodular functions can be approximated by polynomials of degree $O(1/\epsilon^2)$ (Cheraghchi et al., 2012). Our result is proved by constructing an approximation of a submodular function by a DT of rank $4/\epsilon^2$ and a proof that any rank-$r$ DT can be $\epsilon$-approximated by a DT of depth $\frac{5}{2}(r+\log(1/\epsilon))$. We show that these structural results can be exploited to give an attribute-efficient PAC learning algorithm for submodular functions running in time $\tilde{O}(n^2) \cdot 2^{O(1/\epsilon^{4})}$. The best previous algorithm for the problem requires $n^{O(1/\epsilon^{2})}$ time and examples (Cheraghchi et al., 2012) but works also in the agnostic setting. In addition, we give improved learning algorithms for a number of related settings. We also prove that our PAC and agnostic learning algorithms are essentially optimal via two lower bounds: (1) an information-theoretic lower bound of $2^{\Omega(1/\epsilon^{2/3})}$ on the complexity of learning monotone submodular functions in any reasonable model; (2) computational lower bound of $n^{\Omega(1/\epsilon^{2/3})}$ based on a reduction to learning of sparse parities with noise, widely-believed to be intractable. These are the first lower bounds for learning of submodular functions over the uniform distribution.

Pravesh Kothari | Jan Vondrák | Vitaly Feldman

[1] Vitaly Feldman. Attribute-Efficient and Non-adaptive Learning of Parities and DNF Expressions , 2007, J. Mach. Learn. Res..

[2] Eyal Kushilevitz,et al. Learning decision trees using the Fourier spectrum , 1991, STOC '91.

[3] S. Boucheron,et al. A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[4] Gregory Valiant,et al. Finding Correlations in Subquadratic Time, with Applications to Learning Parities and Juntas , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[5] G. Nemhauser,et al. Exceptional Paper—Location of Bank Accounts to Optimize Float: An Analytic Study of Exact and Approximate Algorithms , 1977 .

[6] David P. Williamson,et al. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[7] Daniel Lehmann,et al. Combinatorial auctions with decreasing marginal utilities , 2001, EC '01.

[8] Vitaly Feldman,et al. A Complete Characterization of Statistical Query Learning with Applications to Evolvability , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[9] Vahab S. Mirrokni,et al. Approximating submodular functions everywhere , 2009, SODA.

[10] David Haussler,et al. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[11] Satoru Iwata,et al. A combinatorial strongly polynomial algorithm for minimizing submodular functions , 2001, JACM.

[12] Leonid A. Levin,et al. A hard-core predicate for all one-way functions , 1989, STOC '89.

[13] Pravesh Kothari,et al. Learning Coverage Functions , 2013, ArXiv.

[14] Jan Vondrák,et al. A note on concentration of submodular functions , 2010, ArXiv.

[15] Aaron Roth,et al. Privately releasing conjunctions and the statistical query barrier , 2010, STOC '11.

[16] Adam Tauman Kalai,et al. Agnostically learning decision trees , 2008, STOC.

[17] Sofya Raskhodnikova,et al. Learning pseudo-Boolean k-DNF and submodular functions , 2013, SODA.

[18] Christos H. Papadimitriou,et al. On the Hardness of Being Truthful , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[19] Andreas Krause,et al. Submodularity and its applications in optimized information gathering , 2011, TIST.

[20] Noam Nisan,et al. Approximation algorithms for combinatorial auctions with complement-free bidders , 2005, STOC '05.

[21] Pravesh Kothari,et al. Submodular functions are noise stable , 2012, SODA.

[22] Jan Vondrák,et al. Optimal approximation for the submodular welfare problem in the value oracle model , 2008, STOC.

[23] Vitaly Feldman,et al. Distribution-Specific Agnostic Boosting , 2009, ICS.

[24] Tim Roughgarden,et al. Sketching valuation functions , 2012, SODA.

[25] László Lovász,et al. Submodular functions and convexity , 1982, ISMP.

[26] Andreas Krause,et al. Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[27] R. Schapire,et al. Toward efficient agnostic learning , 1992, COLT '92.

[28] Noam Nisan,et al. Constant depth circuits, Fourier transform, and learnability , 1989, 30th Annual Symposium on Foundations of Computer Science.

[29] F. Dunstan. MATROIDS AND SUBMODULAR FUNCTIONS , 1976 .

[30] Robert E. Schapire,et al. Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[31] Maurice Queyranne,et al. A combinatorial algorithm for minimizing symmetric submodular functions , 1995, SODA '95.

[32] Andreas Krause,et al. Near-optimal sensor placements in Gaussian processes , 2005, ICML.

[33] Vitaly Feldman,et al. On Agnostic Learning of Parities, Monomials, and Halfspaces , 2009, SIAM J. Comput..

[34] C. Guestrin,et al. Near-optimal sensor placements: maximizing information while minimizing communication cost , 2006, 2006 5th International Conference on Information Processing in Sensor Networks.

[35] Satoru Iwata,et al. A combinatorial, strongly polynomial-time algorithm for minimizing submodular functions , 2000, STOC '00.

[36] Adam Tauman Kalai,et al. Potential-Based Agnostic Boosting , 2009, NIPS.

[37] David Haussler,et al. Learning decision trees from random examples , 1988, COLT '88.

[38] Rocco A. Servedio,et al. Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[39] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[40] Tim Roughgarden,et al. From convex optimization to randomized mechanisms: toward optimal combinatorial auctions , 2011, STOC.

[41] Rocco A. Servedio,et al. Private data release via learning thresholds , 2011, SODA.

[42] Ryan O'Donnell,et al. Learning monotone decision trees in polynomial time , 2006, 21st Annual IEEE Conference on Computational Complexity (CCC'06).

[43] Maria-Florina Balcan,et al. Learning Valuation Functions , 2011, COLT.