论文信息 - Generalization Bounds for Neural Networks through Tensor Factorization - 字舞流文

Generalization Bounds for Neural Networks through Tensor Factorization

Training neural networks is a challenging non-convex optimization problem, and backpropagation or gradient descent can get stuck in spurious local optima. We propose a novel algorithm based on tensor decomposition for training a two-layer neural network. We prove efficient generalization bounds for our proposed method, with a polynomial sample complexity in the relevant parameters, such as input dimension and number of neurons. While learning arbitrary target functions is NP-hard, we provide transparent conditions on the function and the input for generalizability. Our training method is based on tensor decomposition, which provably converges to the global optimum, under a set of mild non-degeneracy conditions. It consists of simple embarrassingly parallel linear and multi-linear operations, and is competitive with standard stochastic gradient descent (SGD), in terms of computational complexity. Thus, for the first time, we have a computationally efficient method with guaranteed generalization bounds for training neural networks.

Anima Anandkumar | Majid Janzamin | Hanie Sedghi | Anima Anandkumar | Hanie Sedghi | Majid Janzamin

[1] J. Slawny,et al. Back propagation fails to separate where perceptrons succeed , 1989 .

[2] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[3] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[4] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[5] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.

[6] Alberto Tesi,et al. On the Problem of Local Minima in Backpropagation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[7] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[8] Umesh V. Vazirani,et al. An Introduction to Computational Learning Theory , 1994 .

[9] Robert J. Marks,et al. Fourier Analysis and Filtering of a Single Hidden Layer Perceptron , 1994 .

[10] Raúl Rojas,et al. Neural Networks - A Systematic Introduction , 1996 .

[11] Peter L. Bartlett,et al. The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[12] Christian Kuhlmann,et al. Hardness Results for General Two-Layer Neural Networks , 2000, COLT.

[13] Gene H. Golub,et al. Rank-One Approximation to High Order Tensors , 2001, SIAM J. Matrix Anal. Appl..

[14] Jirí Síma,et al. Training a Single Sigmoidal Neuron Is Hard , 2002, Neural Comput..

[15] P. Bartlett,et al. Hardness results for neural network approximation problems , 1999, Theor. Comput. Sci..

[16] Andrew R. Barron,et al. Approximation and estimation bounds for artificial neural networks , 2004, Machine Learning.

[17] Aapo Hyvärinen,et al. Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[18] Nando de Freitas,et al. On Autoencoders and Score Matching for Energy Based Models , 2011, ICML.

[19] Anima Anandkumar,et al. Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank-1 Updates , 2014, ArXiv.

[20] Yoshua Bengio,et al. What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..

[21] Roi Livni,et al. On the Computational Efficiency of Training Neural Networks , 2014, NIPS.

[22] Aditya Bhaskara,et al. Smoothed analysis of tensor decompositions , 2013, STOC.

[23] Yann LeCun,et al. The Loss Surface of Multilayer Networks , 2014, ArXiv.

[24] Anima Anandkumar,et al. Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[25] Alexandr Andoni,et al. Learning Polynomials with Neural Networks , 2014, ICML.

[26] Anima Anandkumar,et al. Score Function Features for Discriminative Learning: Matrix and Tensor Framework , 2014, ArXiv.

[27] Pravesh Kothari,et al. Almost Optimal Pseudorandom Generators for Spherical Caps , 2014, ArXiv.

[28] Anima Anandkumar,et al. Provable Methods for Training Neural Networks with Sparse Connectivity , 2014, ICLR.

[29] Anima Anandkumar,et al. Learning Overcomplete Latent Variable Models through Tensor Methods , 2014, COLT.