Spurious Valleys in Two-layer Neural Network Optimization Landscapes
暂无分享,去创建一个
[1] Francis R. Bach,et al. On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..
[2] Tengyu Ma,et al. Matrix Completion has No Spurious Local Minimum , 2016, NIPS.
[3] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[4] Tengyu Ma,et al. Identity Matters in Deep Learning , 2016, ICLR.
[5] Roi Livni,et al. On the Computational Efficiency of Training Neural Networks , 2014, NIPS.
[6] Lei Wu,et al. A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics , 2019, Science China Mathematics.
[7] Tengyu Ma,et al. Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.
[8] Jeffrey Pennington,et al. Geometry of Neural Network Loss Surfaces via Random Matrix Theory , 2017, ICML.
[9] Yoram Singer,et al. Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.
[10] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[11] David Lopez-Paz,et al. Geometrical Insights for Implicit Generative Modeling , 2017, Braverman Readings in Machine Learning.
[12] Allan Pinkus,et al. Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.
[13] Yi Zheng,et al. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.
[14] Matthias Hein,et al. The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.
[15] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NIPS 2018.
[16] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[17] Francis Bach,et al. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.
[18] Ohad Shamir,et al. On the Quality of the Initial Basin in Overspecified Neural Networks , 2015, ICML.
[19] David Tse,et al. Porcupine Neural Networks: (Almost) All Local Optima are Global , 2017, ArXiv.
[20] Gene H. Golub,et al. Symmetric Tensors and Symmetric Tensor Rank , 2008, SIAM J. Matrix Anal. Appl..
[21] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[22] Matus Telgarsky,et al. Risk and parameter convergence of logistic regression , 2018, ArXiv.
[23] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.
[24] Razvan Pascanu,et al. Local minima in training of neural networks , 2016, 1611.06310.
[25] Yuandong Tian,et al. Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima , 2017, ICML.
[26] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.
[27] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.
[28] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[29] Kurt Hornik,et al. Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.
[30] Gábor Lugosi,et al. Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.
[31] Julien Mairal,et al. Group Invariance and Stability to Deformations of Deep Convolutional Representations , 2017, ArXiv.
[32] Daniel Soudry,et al. No bad local minima: Data independent training error guarantees for multilayer neural networks , 2016, ArXiv.
[33] Gilad Yehudai,et al. On the Power and Limitations of Random Features for Understanding Neural Networks , 2019, NeurIPS.
[34] Grant M. Rotskoff,et al. Neural Networks as Interacting Particle Systems: Asymptotic Convexity of the Loss Landscape and Universal Scaling of the Approximation Error , 2018, ArXiv.
[35] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[36] Nicolas Boumal,et al. The non-convex Burer-Monteiro approach works on smooth semidefinite programs , 2016, NIPS.
[37] Nicolas Boumal,et al. On the low-rank approach for semidefinite programs arising in synchronization and community detection , 2016, COLT.
[38] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[39] Andrea Montanari,et al. On the Connection Between Learning Two-Layers Neural Networks and Tensor Decomposition , 2018, AISTATS.
[40] Yi Zhou,et al. Critical Points of Neural Networks: Analytical Forms and Landscape Properties , 2017, ArXiv.
[41] Helmut Bölcskei,et al. Optimal Approximation with Sparsely Connected Deep Neural Networks , 2017, SIAM J. Math. Data Sci..
[42] Nadav Cohen,et al. On the Expressive Power of Deep Learning: A Tensor Analysis , 2015, COLT 2016.
[43] Ohad Shamir,et al. Are ResNets Provably Better than Linear Predictors? , 2018, NeurIPS.
[44] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[45] Joan Bruna,et al. Topology and Geometry of Half-Rectified Network Optimization , 2016, ICLR.
[46] Jason D. Lee,et al. On the Power of Over-parametrization in Neural Networks with Quadratic Activation , 2018, ICML.
[47] Fred A. Hamprecht,et al. Essentially No Barriers in Neural Network Energy Landscape , 2018, ICML.
[48] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[49] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[50] Suvrit Sra,et al. A Critical View of Global Optimality in Deep Learning , 2018, ArXiv.
[51] Francis R. Bach,et al. Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..
[52] Suvrit Sra,et al. Small nonlinearities in activation functions create bad local minima in neural networks , 2018, ICLR.
[53] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[54] Samet Oymak,et al. Toward Moderate Overparameterization: Global Convergence Guarantees for Training Shallow Neural Networks , 2019, IEEE Journal on Selected Areas in Information Theory.
[55] Le Song,et al. Diverse Neural Network Learns True Target Functions , 2016, AISTATS.
[56] Amnon Shashua,et al. Convolutional Rectifier Networks as Generalized Tensor Decompositions , 2016, ICML.