Exponentially vanishing sub-optimal local minima in multilayer neural networks
暂无分享,去创建一个
[1] Thomas M. Cover,et al. Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..
[2] Lloyd R. Welch,et al. Lower bounds on the maximum cross correlation of signals (Corresp.) , 1974, IEEE Trans. Inf. Theory.
[3] Eric B. Baum,et al. On the capabilities of multilayer perceptrons , 1988, J. Complex..
[4] Pierre Baldi,et al. Linear Learning: Landscapes and Algorithms , 1988, NIPS.
[5] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..
[6] R. Pemantle,et al. Nonconvergence to Unstable Points in Urn Models and Stochastic Approximations , 1990 .
[7] Kurt Hornik,et al. Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.
[8] Alberto Tesi,et al. On the Problem of Local Minima in Backpropagation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..
[9] Xiao-Hu Yu,et al. Can backpropagation error surface not have local minima , 1992, IEEE Trans. Neural Networks.
[10] L. Eon Bottou. Online Learning and Stochastic Approximations , 1998 .
[11] Kenji Fukumizu,et al. Local minima and plateaus in hierarchical structures of multilayer perceptrons , 2000, Neural Networks.
[12] Jirí Síma,et al. Training a Single Sigmoidal Neuron Is Hard , 2002, Neural Comput..
[13] Chee Kheong Siew,et al. Extreme learning machine: Theory and applications , 2006, Neurocomputing.
[14] J. Stoyanov. Saddlepoint Approximations with Applications , 2008 .
[15] Recovery Guarantees , 2009, Encyclopedia of Database Systems.
[16] C. Matias,et al. Identifiability of parameters in latent structure models with many observed variables , 2008, 0809.5032.
[17] M. Rudelson,et al. Non-asymptotic theory of random matrices: extreme singular values , 2010, 1003.2990.
[18] Michael Elad,et al. Sparse and Redundant Representations - From Theory to Applications in Signal and Image Processing , 2010 .
[19] Yingtong Chen,et al. Influences of preconditioning on the mutual coherence and the restricted isometry property of Gaussian/Bernoulli measurement matrices , 2014 .
[20] Roi Livni,et al. On the Computational Efficiency of Training Neural Networks , 2014, NIPS.
[21] Alex Krizhevsky,et al. One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.
[22] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[23] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[24] Dong Wang,et al. Learning machines: Rationale and application in ground-level ozone prediction , 2014, Appl. Soft Comput..
[25] Alexandr Andoni,et al. Learning Polynomials with Neural Networks , 2014, ICML.
[26] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.
[27] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[28] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[29] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[30] John Wright,et al. When Are Nonconvex Problems Not Scary? , 2015, ArXiv.
[31] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[32] Le Song,et al. Diversity Leads to Generalization in Neural Networks , 2016, ArXiv.
[33] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Ohad Shamir,et al. On the Quality of the Initial Basin in Overspecified Neural Networks , 2015, ICML.
[35] Razvan Pascanu,et al. Local minima in training of neural networks , 2016, 1611.06310.
[36] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[37] Razvan Pascanu,et al. Local minima in training of deep networks , 2017, ArXiv.
[38] Surya Ganguli,et al. Exponential expressivity in deep neural networks through transient chaos , 2016, NIPS.
[39] Joan Bruna,et al. Topology and Geometry of Deep Rectified Network Optimization Landscapes , 2016 .
[40] Daniel Soudry,et al. No bad local minima: Data independent training error guarantees for multilayer neural networks , 2016, ArXiv.
[41] Michael I. Jordan,et al. Gradient Descent Converges to Minimizers , 2016, ArXiv.
[42] Shie Mannor,et al. Ensemble Robustness of Deep Learning Algorithms , 2016, ArXiv.
[43] Hao Shen,et al. Designing and Training Feedforward Neural Networks: A Smooth Optimisation Perspective , 2016, ArXiv.
[44] Gintare Karolina Dziugaite,et al. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.
[45] Yuandong Tian,et al. Symmetry-Breaking Convergence Analysis of Certain Two-layered Neural Networks with ReLU nonlinearity , 2017, ICLR.
[46] Anima Anandkumar,et al. Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods , 2017 .
[47] Rina Panigrahy,et al. Electron-Proton Dynamics in Deep Learning , 2017, ArXiv.
[48] Tengyu Ma,et al. Identity Matters in Deep Learning , 2016, ICLR.
[49] Jeffrey Pennington,et al. Geometry of Neural Network Loss Surfaces via Random Matrix Theory , 2017, ICML.
[50] Haihao Lu,et al. Depth Creates No Bad Local Minima , 2017, ArXiv.
[51] Guillermo Sapiro,et al. Robust Large Margin Deep Neural Networks , 2016, IEEE Transactions on Signal Processing.
[52] Ohad Shamir,et al. Weight Sharing is Crucial to Succesful Optimization , 2017, ArXiv.
[53] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.
[54] Matthias Hein,et al. The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.
[55] Ohad Shamir,et al. Distribution-Specific Hardness of Learning Neural Networks , 2016, J. Mach. Learn. Res..
[56] Yuandong Tian,et al. When is a Convolutional Filter Easy To Learn? , 2017, ICLR.
[57] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.