Sub-Optimal Local Minima Exist for Neural Networks with Almost All Non-Linear Activations
暂无分享,去创建一个
[1] R. Srikant,et al. Understanding the Loss Surface of Neural Networks for Binary Classification , 2018, ICML.
[2] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.
[3] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[4] Andrea Montanari,et al. On the Connection Between Learning Two-Layers Neural Networks and Tensor Decomposition , 2018, AISTATS.
[5] Zhi-Quan Luo,et al. Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.
[6] Yuanzhi Li,et al. Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.
[7] Li Zhang,et al. Depth creates no more spurious local minima , 2019, ArXiv.
[8] Matus Telgarsky,et al. Benefits of Depth in Neural Networks , 2016, COLT.
[9] Elad Hoffer,et al. Exponentially vanishing sub-optimal local minima in multilayer neural networks , 2017, ICLR.
[10] R. Srikant,et al. Why Deep Neural Networks for Function Approximation? , 2016, ICLR.
[11] Justin A. Sirignano,et al. Mean Field Analysis of Neural Networks: A Law of Large Numbers , 2018, SIAM J. Appl. Math..
[12] Javad Lavaei,et al. How Much Restricted Isometry is Needed In Nonconvex Matrix Recovery? , 2018, NeurIPS.
[13] B. Mityagin. The Zero Set of a Real Analytic Function , 2015, Mathematical Notes.
[14] Peter Auer,et al. Exponentially many local minima for single neurons , 1995, NIPS.
[15] Suvrit Sra,et al. Small nonlinearities in activation functions create bad local minima in neural networks , 2018, ICLR.
[16] Dawei Li,et al. Over-Parameterized Deep Neural Networks Have No Strict Local Minima For Any Continuous Activations , 2018, ArXiv.
[17] Tengyu Ma,et al. Matrix Completion has No Spurious Local Minimum , 2016, NIPS.
[18] Pramod Viswanath,et al. Learning One-hidden-layer Neural Networks under General Input Distributions , 2018, AISTATS.
[19] Francis Bach,et al. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.
[20] R. Srikant,et al. Adding One Neuron Can Eliminate All Bad Local Minima , 2018, NeurIPS.
[21] René Vidal,et al. Global Optimality in Neural Network Training , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.
[23] Ohad Shamir,et al. Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks , 2016, ICML.
[24] Ruoyu Sun,et al. Optimization for deep learning: theory and algorithms , 2019, ArXiv.
[25] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[26] Andrew Gordon Wilson,et al. Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs , 2018, NeurIPS.
[27] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[28] Razvan Pascanu,et al. Local minima in training of deep networks , 2017, ArXiv.
[29] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.
[30] Nathan Srebro,et al. Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.
[31] Mahdi Soltanolkotabi,et al. Learning ReLUs via Gradient Descent , 2017, NIPS.
[32] Tengyu Ma,et al. Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.
[33] Thomas Laurent,et al. The Multilinear Structure of ReLU Networks , 2017, ICML.
[34] Dacheng Tao,et al. Piecewise linear activations substantially shape the loss surfaces of neural networks , 2020, ICLR.
[35] Roi Livni,et al. On the Computational Efficiency of Training Neural Networks , 2014, NIPS.
[36] Levent Sagun,et al. The jamming transition as a paradigm to understand the loss landscape of deep neural networks , 2018, Physical review. E.
[37] Thomas Laurent,et al. Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global , 2017, ICML.
[38] Matthias Hein,et al. On the loss landscape of a class of deep neural networks with no bad local valleys , 2018, ICLR.
[39] Haihao Lu,et al. Depth Creates No Bad Local Minima , 2017, ArXiv.
[40] Yi Zhou,et al. Critical Points of Neural Networks: Analytical Forms and Landscape Properties , 2017, ArXiv.
[41] Joan Bruna,et al. Topology and Geometry of Half-Rectified Network Optimization , 2016, ICLR.
[42] Jason D. Lee,et al. On the Power of Over-parametrization in Neural Networks with Quadratic Activation , 2018, ICML.
[43] Fred A. Hamprecht,et al. Essentially No Barriers in Neural Network Energy Landscape , 2018, ICML.
[44] J. Zico Kolter,et al. Uniform convergence may be unable to explain generalization in deep learning , 2019, NeurIPS.
[45] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[46] X H Yu,et al. On the local minima free condition of backpropagation learning , 1995, IEEE Trans. Neural Networks.
[47] Grant M. Rotskoff,et al. Neural Networks as Interacting Particle Systems: Asymptotic Convexity of the Loss Landscape and Universal Scaling of the Approximation Error , 2018, ArXiv.
[48] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.
[49] Joan Bruna,et al. Spurious Valleys in Two-layer Neural Network Optimization Landscapes , 2018, 1802.06384.
[50] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[51] David Lopez-Paz,et al. Easing non-convex optimization with neural networks , 2018, ICLR.
[52] Gang Wang,et al. Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization , 2018, IEEE Transactions on Signal Processing.
[53] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.
[54] Marco Gori,et al. Optimal learning in artificial neural networks: A review of theoretical results , 1996, Neurocomputing.
[55] Quynh Nguyen,et al. On Connected Sublevel Sets in Deep Learning , 2019, ICML.
[56] Eduardo D. Sontag,et al. Backpropagation Can Give Rise to Spurious Local Minima Even for Networks without Hidden Layers , 1989, Complex Syst..
[57] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[58] David Tse,et al. Porcupine Neural Networks: (Almost) All Local Optima are Global , 2017, ArXiv.
[59] Rina Panigrahy,et al. Convergence Results for Neural Networks via Electrodynamics , 2017, ITCS.
[60] Yuandong Tian,et al. An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis , 2017, ICML.
[61] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[62] Samet Oymak,et al. Toward Moderate Overparameterization: Global Convergence Guarantees for Training Shallow Neural Networks , 2019, IEEE Journal on Selected Areas in Information Theory.