Representation, Approximation and Learning of Submodular Functions Using Low-rank Decision Trees

We study the complexity of approximate representation and learning of submodular functions over the uniform distribution on the Boolean hypercube $\{0,1\}^n$. Our main result is the following structural theorem: any submodular function is $\epsilon$-close in $\ell_2$ to a real-valued decision tree (DT) of depth $O(1/\epsilon^2)$. This immediately implies that any submodular function is $\epsilon$-close to a function of at most $2^{O(1/\epsilon^2)}$ variables and has a spectral $\ell_1$ norm of $2^{O(1/\epsilon^2)}$. It also implies the closest previous result that states that submodular functions can be approximated by polynomials of degree $O(1/\epsilon^2)$ (Cheraghchi et al., 2012). Our result is proved by constructing an approximation of a submodular function by a DT of rank $4/\epsilon^2$ and a proof that any rank-$r$ DT can be $\epsilon$-approximated by a DT of depth $\frac{5}{2}(r+\log(1/\epsilon))$. We show that these structural results can be exploited to give an attribute-efficient PAC learning algorithm for submodular functions running in time $\tilde{O}(n^2) \cdot 2^{O(1/\epsilon^{4})}$. The best previous algorithm for the problem requires $n^{O(1/\epsilon^{2})}$ time and examples (Cheraghchi et al., 2012) but works also in the agnostic setting. In addition, we give improved learning algorithms for a number of related settings. We also prove that our PAC and agnostic learning algorithms are essentially optimal via two lower bounds: (1) an information-theoretic lower bound of $2^{\Omega(1/\epsilon^{2/3})}$ on the complexity of learning monotone submodular functions in any reasonable model; (2) computational lower bound of $n^{\Omega(1/\epsilon^{2/3})}$ based on a reduction to learning of sparse parities with noise, widely-believed to be intractable. These are the first lower bounds for learning of submodular functions over the uniform distribution.

[1]  Vitaly Feldman Attribute-Efficient and Non-adaptive Learning of Parities and DNF Expressions , 2007, J. Mach. Learn. Res..

[2]  Eyal Kushilevitz,et al.  Learning decision trees using the Fourier spectrum , 1991, STOC '91.

[3]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[4]  Gregory Valiant,et al.  Finding Correlations in Subquadratic Time, with Applications to Learning Parities and Juntas , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[5]  G. Nemhauser,et al.  Exceptional Paper—Location of Bank Accounts to Optimize Float: An Analytic Study of Exact and Approximate Algorithms , 1977 .

[6]  David P. Williamson,et al.  Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[7]  Daniel Lehmann,et al.  Combinatorial auctions with decreasing marginal utilities , 2001, EC '01.

[8]  Vitaly Feldman,et al.  A Complete Characterization of Statistical Query Learning with Applications to Evolvability , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[9]  Vahab S. Mirrokni,et al.  Approximating submodular functions everywhere , 2009, SODA.

[10]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[11]  Satoru Iwata,et al.  A combinatorial strongly polynomial algorithm for minimizing submodular functions , 2001, JACM.

[12]  Leonid A. Levin,et al.  A hard-core predicate for all one-way functions , 1989, STOC '89.

[13]  Pravesh Kothari,et al.  Learning Coverage Functions , 2013, ArXiv.

[14]  Jan Vondrák,et al.  A note on concentration of submodular functions , 2010, ArXiv.

[15]  Aaron Roth,et al.  Privately releasing conjunctions and the statistical query barrier , 2010, STOC '11.

[16]  Adam Tauman Kalai,et al.  Agnostically learning decision trees , 2008, STOC.

[17]  Sofya Raskhodnikova,et al.  Learning pseudo-Boolean k-DNF and submodular functions , 2013, SODA.

[18]  Christos H. Papadimitriou,et al.  On the Hardness of Being Truthful , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[19]  Andreas Krause,et al.  Submodularity and its applications in optimized information gathering , 2011, TIST.

[20]  Noam Nisan,et al.  Approximation algorithms for combinatorial auctions with complement-free bidders , 2005, STOC '05.

[21]  Pravesh Kothari,et al.  Submodular functions are noise stable , 2012, SODA.

[22]  Jan Vondrák,et al.  Optimal approximation for the submodular welfare problem in the value oracle model , 2008, STOC.

[23]  Vitaly Feldman,et al.  Distribution-Specific Agnostic Boosting , 2009, ICS.

[24]  Tim Roughgarden,et al.  Sketching valuation functions , 2012, SODA.

[25]  László Lovász,et al.  Submodular functions and convexity , 1982, ISMP.

[26]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[27]  R. Schapire,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[28]  Noam Nisan,et al.  Constant depth circuits, Fourier transform, and learnability , 1989, 30th Annual Symposium on Foundations of Computer Science.

[29]  F. Dunstan MATROIDS AND SUBMODULAR FUNCTIONS , 1976 .

[30]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[31]  Maurice Queyranne,et al.  A combinatorial algorithm for minimizing symmetric submodular functions , 1995, SODA '95.

[32]  Andreas Krause,et al.  Near-optimal sensor placements in Gaussian processes , 2005, ICML.

[33]  Vitaly Feldman,et al.  On Agnostic Learning of Parities, Monomials, and Halfspaces , 2009, SIAM J. Comput..

[34]  C. Guestrin,et al.  Near-optimal sensor placements: maximizing information while minimizing communication cost , 2006, 2006 5th International Conference on Information Processing in Sensor Networks.

[35]  Satoru Iwata,et al.  A combinatorial, strongly polynomial-time algorithm for minimizing submodular functions , 2000, STOC '00.

[36]  Adam Tauman Kalai,et al.  Potential-Based Agnostic Boosting , 2009, NIPS.

[37]  David Haussler,et al.  Learning decision trees from random examples , 1988, COLT '88.

[38]  Rocco A. Servedio,et al.  Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[39]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[40]  Tim Roughgarden,et al.  From convex optimization to randomized mechanisms: toward optimal combinatorial auctions , 2011, STOC '11.

[41]  Rocco A. Servedio,et al.  Private data release via learning thresholds , 2011, SODA.

[42]  Ryan O'Donnell,et al.  Learning monotone decision trees in polynomial time , 2006, 21st Annual IEEE Conference on Computational Complexity (CCC'06).

[43]  Maria-Florina Balcan,et al.  Learning Valuation Functions , 2011, COLT.