Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals

We consider the problem of computing the best-fitting ReLU with respect to square-loss on a training set when the examples have been drawn according to a spherical Gaussian distribution (the labels can be arbitrary). Let $\opt < 1$ be the population loss of the best-fitting ReLU. We prove: \begin{itemize} \item Finding a ReLU with square-loss $\opt + \epsilon$ is as hard as the problem of learning sparse parities with noise, widely thought to be computationally intractable. This is the first hardness result for learning a ReLU with respect to Gaussian marginals, and our results imply --{\em unconditionally}-- that gradient descent cannot converge to the global minimum in polynomial time. \item There exists an efficient approximation algorithm for finding the best-fitting ReLU that achieves error $O(\opt^{2/3})$. The algorithm uses a novel reduction to noisy halfspace learning with respect to $0/1$ loss. \end{itemize} Prior work due to Soltanolkotabi \cite{soltanolkotabi2017learning} showed that gradient descent {\em can} find the best-fitting ReLU with respect to Gaussian marginals, if the training set is {\em exactly} labeled by a ReLU.

[1]  Ohad Shamir,et al.  Distribution-Specific Hardness of Learning Neural Networks , 2016, J. Mach. Learn. Res..

[2]  Adam Tauman Kalai,et al.  The Isotron Algorithm: High-Dimensional Isotonic Regression , 2009, COLT.

[3]  Pasin Manurangsi,et al.  The Computational Complexity of Training ReLU(s) , 2018, ArXiv.

[4]  Francois Buet-Golfouse A Multinomial Theorem for Hermite Polynomials and Financial Applications , 2015 .

[5]  Emmanuel Abbe,et al.  Provable limitations of deep learning , 2018, ArXiv.

[6]  Guanghui Lan,et al.  Complexity of Training ReLU Neural Networks , 2018 .

[7]  Gregory Valiant Finding Correlations in Subquadratic Time, with Applications to Learning Parities and the Closest Pair Problem , 2015, J. ACM.

[8]  John Wilmes,et al.  Gradient Descent for One-Hidden-Layer Neural Networks: Polynomial Convergence and SQ Lower Bounds , 2018, COLT.

[9]  Daniel M. Kane,et al.  Learning geometric concepts with nasty noise , 2017, STOC.

[10]  Inderjit S. Dhillon,et al.  Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.

[11]  Amir Globerson,et al.  Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.

[12]  Guanghui Lan,et al.  Complexity of Training ReLU Neural Network , 2018, Discret. Optim..

[13]  Ohad Shamir,et al.  Failures of Gradient-Based Deep Learning , 2017, ICML.

[14]  Pravesh Kothari,et al.  Embedding Hard Learning Problems into Gaussian Space , 2014, Electron. Colloquium Comput. Complex..

[15]  Balázs Szörényi Characterizing Statistical Query Learning: Simplified Notions and Proofs , 2009, ALT.

[16]  Xiao Zhang,et al.  Learning One-hidden-layer ReLU Networks via Gradient Descent , 2018, AISTATS.

[17]  Adam Tauman Kalai,et al.  Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression , 2011, NIPS.

[18]  Devavrat Shah,et al.  Structure learning of antiferromagnetic Ising models , 2014, NIPS.

[19]  Zhize Li,et al.  Learning Two-layer Neural Networks with Symmetric Inputs , 2018, ICLR.

[20]  Vitaly Feldman,et al.  Statistical Query Learning , 2008, Encyclopedia of Algorithms.

[21]  Maria-Florina Balcan,et al.  The Power of Localization for Efficiently Learning Linear Separators with Noise , 2013, J. ACM.

[22]  Chicheng Zhang,et al.  Efficient active learning of sparse halfspaces , 2018, COLT.

[23]  Roman Vershynin,et al.  Four lectures on probabilistic methods for data science , 2016, IAS/Park City Mathematics Series.

[24]  Varun Kanade,et al.  Reliably Learning the ReLU in Polynomial Time , 2016, COLT.

[25]  Rocco A. Servedio,et al.  Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[26]  Santosh S. Vempala,et al.  Statistical Query Algorithms for Stochastic Convex Optimization , 2015, ArXiv.

[27]  Mahdi Soltanolkotabi,et al.  Learning ReLUs via Gradient Descent , 2017, NIPS.

[28]  Raghu Meka,et al.  Learning One Convolutional Layer with Overlapping Patches , 2018, ICML.

[29]  Tengyu Ma,et al.  Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.

[30]  Santosh S. Vempala,et al.  Polynomial Convergence of Gradient Descent for Training One-Hidden-Layer Neural Networks , 2018, ArXiv.