论文信息 - Statistical-Query Lower Bounds via Functional Gradients

Statistical-Query Lower Bounds via Functional Gradients

We give the first statistical-query lower bounds for agnostically learning any non-polynomial activation with respect to Gaussian marginals (e.g., ReLU, sigmoid, sign). For the specific problem of ReLU regression (equivalently, agnostically learning a ReLU), we show that any statistical-query algorithm with tolerance $n^{-\Theta(\epsilon^{-1/2})}$ must use at least $2^{n^c} \epsilon$ queries for some constant $c > 0$, where $n$ is the dimension and $\epsilon$ is the accuracy parameter. Our results rule out general (as opposed to correlational) SQ learning algorithms, which is unusual for real-valued learning problems. Our techniques involve a gradient boosting procedure for "amplifying" recent lower bounds due to Diakonikolas et al. (COLT 2020) and Goel et al. (ICML 2020) on the SQ dimension of functions computed by two-layer neural networks. The crucial new ingredient is the use of a nonstandard convex functional during the boosting procedure. This also yields a best-possible reduction between two commonly studied models of learning: agnostic learning and probabilistic concepts.

[1] Ilias Diakonikolas,et al. Approximation Schemes for ReLU Regression , 2020, COLT.

[2] Vitaly Feldman,et al. A Complete Characterization of Statistical Query Learning with Applications to Evolvability , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[3] Jeffrey C. Jackson,et al. An efficient membership-query algorithm for learning DNF with respect to the uniform distribution , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[4] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[5] John Wilmes,et al. Gradient Descent for One-Hidden-Layer Neural Networks: Polynomial Convergence and SQ Lower Bounds , 2018, COLT.

[6] Alexandr Andoni,et al. Attribute-efficient learning of monomials over highly-correlated variables , 2019, ALT.

[7] Adam R. Klivans,et al. Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals , 2019, NeurIPS.

[8] Yuan Cao,et al. Agnostic Learning of a Single Neuron with Gradient Descent , 2020, NeurIPS.

[9] Pravesh Kothari,et al. Embedding Hard Learning Problems into Gaussian Space , 2014, Electron. Colloquium Comput. Complex..

[10] Roi Livni,et al. On the Computational Efficiency of Training Neural Networks , 2014, NIPS.

[11] Abhishek Panigrahi,et al. Effect of Activation Functions on the Training of Overparametrized Neural Nets , 2019, ICLR.

[12] Daniel M. Kane,et al. Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals , 2020, NeurIPS.

[13] Robert E. Schapire,et al. Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[14] Daniel M. Kane,et al. Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU Networks , 2020, COLT.

[15] Haipeng Luo,et al. Online Gradient Boosting , 2015, NIPS.

[16] Vitaly Feldman,et al. Distribution-Specific Agnostic Boosting , 2009, ICS.

[17] Le Song,et al. On the Complexity of Learning Neural Networks , 2017, NIPS.

[18] Gilad Yehudai,et al. Learning a Single Neuron with Gradient Methods , 2020, COLT 2020.

[19] C. Fonseca,et al. Basic trigonometric power sums with applications , 2016, 1601.07839.

[20] Alexandr Andoni,et al. Learning Sparse Polynomial Functions , 2014, SODA.

[21] Peter L. Bartlett,et al. Boosting Algorithms as Gradient Descent , 1999, NIPS.

[22] Daniel M. Kane,et al. Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[23] E. Hille,et al. Contributions to the theory of Hermitian series. II. The representation problem , 1940 .

[24] Gilad Yehudai,et al. On the Power and Limitations of Random Features for Understanding Neural Networks , 2019, NeurIPS.

[25] Varun Kanade,et al. Reliably Learning the ReLU in Polynomial Time , 2016, COLT.

[26] Rocco A. Servedio,et al. Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[27] Alex M. Andrew,et al. Boosting: Foundations and Algorithms , 2012 .

[28] Martin Jaggi,et al. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[29] Noam Nisan,et al. Constant depth circuits, Fourier transform, and learnability , 1989, 30th Annual Symposium on Foundations of Computer Science.

[30] J. Friedman. Greedy function approximation: A gradient boosting machine. , 2001 .

[31] Daniel M. Kane,et al. Bounded Independence Fools Degree-2 Threshold Functions , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[32] Francis R. Bach,et al. Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..

[33] John P. Boyd,et al. Asymptotic coefficients of hermite function series , 1984 .

[34] Adam R. Klivans,et al. Superpolynomial Lower Bounds for Learning One-Layer Neural Networks using Gradient Descent , 2020, ICML.

[35] Adam Tauman Kalai,et al. Potential-Based Agnostic Boosting , 2009, NIPS.