论文信息 - On the Computational Efficiency of Training Neural Networks

On the Computational Efficiency of Training Neural Networks

It is well-known that neural networks are computationally hard to train. On the other hand, in practice, modern day neural networks are trained efficiently using SGD and a variety of tricks that include different activation functions (e.g. ReLU), over-specification (i.e., train networks which are larger than needed), and regularization. In this paper we revisit the computational complexity of training neural networks from a modern perspective. We provide both positive and negative results, some of them yield new provably efficient and practical algorithms for training certain types of neural networks.

[1] Michael J. Fischer,et al. Relations Among Complexity Measures , 1979, JACM.

[2] Eitan M. Gurari,et al. Introduction to the theory of computation , 1989 .

[3] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.

[4] Peter Auer,et al. Exponentially many local minima for single neurons , 1995, NIPS.

[5] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .

[6] P. Bartlett,et al. Hardness results for neural network approximation problems , 1999, Theor. Comput. Sci..

[7] Rocco A. Servedio,et al. Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[8] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9] Alexander A. Sherstov,et al. Cryptographic Hardness for Learning Intersections of Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[10] Adam Tauman Kalai,et al. Learning and Smoothed Analysis , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[11] Ryan O'Donnell,et al. Polynomial regression under arbitrary product distributions , 2010, Machine Learning.

[12] Ohad Shamir,et al. Learning Kernel-Based Halfspaces with the 0-1 Loss , 2011, SIAM J. Comput..

[13] Ohad Shamir,et al. Large-Scale Convex Minimization with a Low-Rank Constraint , 2011, ICML.

[14] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Tara N. Sainath,et al. Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.

[19] Rob Fergus,et al. Visualizing and Understanding Convolutional Neural Networks , 2013 .

[20] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[21] Aditya Bhaskara,et al. Provable Bounds for Learning Some Deep Representations , 2013, ICML.

[22] Nathan Linial,et al. From average case complexity to improper learning complexity , 2013, STOC.

[23] Alexandr Andoni,et al. Learning Sparse Polynomial Functions , 2014, SODA.

[24] Alexandr Andoni,et al. Learning Polynomials with Neural Networks , 2014, ICML.

[25] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .