Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods
暂无分享,去创建一个
[1] Hervé Bourlard,et al. Generalization and Parameter Estimation in Feedforward Netws: Some Experiments , 1989, NIPS.
[2] J. Slawny,et al. Back propagation fails to separate where perceptrons succeed , 1989 .
[3] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..
[4] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.
[5] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.
[6] Kurt Hornik,et al. Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.
[7] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.
[8] Alberto Tesi,et al. On the Problem of Local Minima in Backpropagation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..
[9] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.
[10] Umesh V. Vazirani,et al. An Introduction to Computational Learning Theory , 1994 .
[11] Robert J. Marks,et al. Fourier Analysis and Filtering of a Single Hidden Layer Perceptron , 1994 .
[12] Raúl Rojas,et al. Neural Networks - A Systematic Introduction , 1996 .
[13] Peter L. Bartlett,et al. The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.
[14] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[15] Christian Kuhlmann,et al. Hardness Results for General Two-Layer Neural Networks , 2000, COLT.
[16] Gene H. Golub,et al. Rank-One Approximation to High Order Tensors , 2001, SIAM J. Matrix Anal. Appl..
[17] Jirí Síma,et al. Training a Single Sigmoidal Neuron Is Hard , 2002, Neural Comput..
[18] P. Bartlett,et al. Hardness results for neural network approximation problems , 1999, Theor. Comput. Sci..
[19] Andrew R. Barron,et al. Approximation and estimation bounds for artificial neural networks , 2004, Machine Learning.
[20] Aapo Hyvärinen,et al. Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..
[21] M. Rudelson,et al. The smallest singular value of a random rectangular matrix , 2008, 0802.3956.
[22] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..
[23] Pablo A. Parrilo,et al. Latent variable graphical model selection via convex optimization , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[24] Shachar Lovett. An elementary proof of anti-concentration of polynomials in Gaussian variables , 2010, Electron. Colloquium Comput. Complex..
[25] Nando de Freitas,et al. On Autoencoders and Score Matching for Energy Based Models , 2011, ICML.
[26] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[27] Sham M. Kakade,et al. Random Design Analysis of Ridge Regression , 2012, COLT.
[28] Anima Anandkumar,et al. A Tensor Spectral Approach to Learning Mixed Membership Community Models , 2013, COLT.
[29] Antonio Auffinger,et al. Complexity of random smooth functions on the high-dimensional sphere , 2011, 1110.5872.
[30] Anima Anandkumar,et al. Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank-1 Updates , 2014, ArXiv.
[31] Aditya Bhaskara,et al. Provable Bounds for Learning Some Deep Representations , 2013, ICML.
[32] Yoshua Bengio,et al. What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..
[33] Roi Livni,et al. On the Computational Efficiency of Training Neural Networks , 2014, NIPS.
[34] Prateek Jain,et al. Learning Sparsely Used Overcomplete Dictionaries , 2014, COLT.
[35] Aditya Bhaskara,et al. Smoothed analysis of tensor decompositions , 2013, STOC.
[36] Yann LeCun,et al. The Loss Surface of Multilayer Networks , 2014, ArXiv.
[37] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[38] Anima Anandkumar,et al. Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..
[39] Le Song,et al. Nonparametric Estimation of Multi-View Latent Variable Models , 2013, ICML.
[40] Alexandr Andoni,et al. Learning Polynomials with Neural Networks , 2014, ICML.
[41] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[42] Anima Anandkumar,et al. Score Function Features for Discriminative Learning: Matrix and Tensor Framework , 2014, ArXiv.
[43] Pravesh Kothari,et al. Almost Optimal Pseudorandom Generators for Spherical Caps , 2014, ArXiv.
[44] Alexander J. Smola,et al. Fast and Guaranteed Tensor Decomposition via Sketching , 2015, NIPS.
[45] René Vidal,et al. Global Optimality in Tensor Factorization, Deep Learning, and Beyond , 2015, ArXiv.
[46] Anima Anandkumar,et al. Provable Methods for Training Neural Networks with Sparse Connectivity , 2014, ICLR.
[47] Anima Anandkumar,et al. Learning Overcomplete Latent Variable Models through Tensor Methods , 2014, COLT.
[48] Yuchen Zhang,et al. L1-regularized Neural Networks are Improperly Learnable in Polynomial Time , 2015, ICML.
[49] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[50] Aapo Hyvärinen,et al. Density Estimation in Infinite Dimensional Exponential Families , 2013, J. Mach. Learn. Res..