The Computational Complexity of Training ReLU(s)

We consider the computational complexity of training depth-2 neural networks composed of rectified linear units (ReLUs). We show that, even for the case of a single ReLU, finding a set of weights that minimizes the squared error (even approximately) for a given training set is NP-hard. We also show that for a simple network consisting of two ReLUs, the error minimization problem is NP-hard, even in the realizable case. We complement these hardness results by showing that, when the weights and samples belong to the unit ball, one can (agnostically) properly and reliably learn depth-2 ReLUs with $k$ units and error at most $\epsilon$ in time $2^{(k/\epsilon)^{O(1)}}n^{O(1)}$; this extends upon a previous work of Goel, Kanade, Klivans and Thaler (2017) which provided efficient improper learning algorithms for ReLUs.

[1]  Guanghui Lan,et al.  Complexity of Training ReLU Neural Network , 2018, Discret. Optim..

[2]  Roi Livni,et al.  On the Computational Efficiency of Training Neural Networks , 2014, NIPS.

[3]  Ambuj Tewari,et al.  On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.

[4]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[5]  Russell Impagliazzo,et al.  Complexity of k-SAT , 1999, Proceedings. Fourteenth Annual IEEE Conference on Computational Complexity (Formerly: Structure in Complexity Theory Conference) (Cat.No.99CB36317).

[6]  Guanghui Lan,et al.  Complexity of Training ReLU Neural Networks , 2018 .

[7]  Russell Impagliazzo,et al.  Which problems have strongly exponential complexity? , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[8]  Adam Tauman Kalai,et al.  Reliable Agnostic Learning , 2009, COLT.

[9]  Irit Dinur,et al.  On the hardness of approximating label-cover , 2004, Inf. Process. Lett..

[10]  Guy Kindler,et al.  Polynomially Low Error PCPs with polyloglog n Queries via Modular Composition , 2015, STOC.

[11]  Francis R. Bach,et al.  Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..

[12]  R. Schapire,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[13]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[14]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[15]  Michael Alekhnovich,et al.  Minimum propositional proof length is NP-hard to linearly approximate , 1998, Journal of Symbolic Logic.

[16]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[17]  Amir Globerson,et al.  Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.

[18]  Raman Arora,et al.  Understanding Deep Neural Networks with Rectified Linear Units , 2016, Electron. Colloquium Comput. Complex..

[19]  Varun Kanade,et al.  Reliably Learning the ReLU in Polynomial Time , 2016, COLT.

[20]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[21]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.