Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima
暂无分享,去创建一个
Yuandong Tian | Barnabás Póczos | Jason D. Lee | Simon S. Du | Aarti Singh | S. Du | Aarti Singh | J. Lee | B. Póczos | Yuandong Tian
[1] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.
[2] Lawrence K. Saul,et al. Kernel Methods for Deep Learning , 2009, NIPS.
[3] Yair Carmon,et al. Accelerated Methods for Non-Convex Optimization , 2016, SIAM J. Optim..
[4] Varun Kanade,et al. Reliably Learning the ReLU in Polynomial Time , 2016, COLT.
[5] Anastasios Kyrillidis,et al. Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach , 2016, AISTATS.
[6] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[7] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[8] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[9] René Vidal,et al. Global Optimality in Tensor Factorization, Deep Learning, and Beyond , 2015, ArXiv.
[10] Anima Anandkumar,et al. Provable Methods for Training Neural Networks with Sparse Connectivity , 2014, ICLR.
[11] Yair Carmon,et al. Accelerated Methods for NonConvex Optimization , 2018, SIAM J. Optim..
[12] Joan Bruna,et al. Topology and Geometry of Half-Rectified Network Optimization , 2016, ICLR.
[13] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[14] Zeyuan Allen-Zhu,et al. Natasha 2: Faster Non-Convex Optimization Than SGD , 2017, NeurIPS.
[15] Anima Anandkumar,et al. Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods , 2017 .
[16] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[17] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[18] Tengyu Ma,et al. Identity Matters in Deep Learning , 2016, ICLR.
[19] Ohad Shamir,et al. Failures of Gradient-Based Deep Learning , 2017, ICML.
[20] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.
[21] A. Montanari,et al. The landscape of empirical risk for nonconvex losses , 2016, The Annals of Statistics.
[22] Roi Livni,et al. On the Computational Efficiency of Training Neural Networks , 2014, NIPS.
[23] Yuandong Tian,et al. An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis , 2017, ICML.
[24] Jiashi Feng,et al. The Landscape of Deep Learning Algorithms , 2017, ArXiv.
[25] Nathan Srebro,et al. Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.
[26] Mahdi Soltanolkotabi,et al. Learning ReLUs via Gradient Descent , 2017, NIPS.
[27] Tengyu Ma,et al. Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.
[28] Yuandong Tian,et al. When is a Convolutional Filter Easy To Learn? , 2017, ICLR.
[29] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.
[30] Adam R. Klivans,et al. Learning Depth-Three Neural Networks in Polynomial Time , 2017, ArXiv.
[31] Tengyu Ma,et al. Finding approximate local minima faster than gradient descent , 2016, STOC.
[32] Yi Zheng,et al. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.
[33] Tengyu Ma,et al. Matrix Completion has No Spurious Local Minimum , 2016, NIPS.
[34] John Wright,et al. Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture , 2015, IEEE Transactions on Information Theory.
[35] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[36] Ohad Shamir,et al. Weight Sharing is Crucial to Succesful Optimization , 2017, ArXiv.
[37] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.
[38] Arian Maleki,et al. Global Analysis of Expectation Maximization for Mixtures of Two Gaussians , 2016, NIPS.
[39] Matthias Hein,et al. The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.
[40] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..
[41] Adam R. Klivans,et al. Eigenvalue Decay Implies Polynomial-Time Learnability for Neural Networks , 2017, NIPS.
[42] Junwei Lu,et al. Symmetry, Saddle Points, and Global Geometry of Nonconvex Matrix Factorization , 2016, ArXiv.
[43] Inderjit S. Dhillon,et al. Learning Non-overlapping Convolutional Neural Networks with Multiple Kernels , 2017, ArXiv.
[44] Matthias Hein,et al. The loss surface and expressivity of deep convolutional neural networks , 2017, ICLR.
[45] Kfir Y. Levy,et al. The Power of Normalization: Faster Evasion of Saddle Points , 2016, ArXiv.
[46] Le Song,et al. Diverse Neural Network Learns True Target Functions , 2016, AISTATS.
[47] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[48] Yann LeCun,et al. Open Problem: The landscape of the loss surfaces of multilayer networks , 2015, COLT.
[49] Yuanzhi Li,et al. Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.
[50] Martin J. Wainwright,et al. Learning Halfspaces and Neural Networks with Random Initialization , 2015, ArXiv.
[51] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.
[52] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[53] Christos Tzamos,et al. Ten Steps of EM Suffice for Mixtures of Two Gaussians , 2016, COLT.
[54] David Tse,et al. Porcupine Neural Networks: (Almost) All Local Optima are Global , 2017, ArXiv.
[55] Michael I. Jordan,et al. Gradient Descent Can Take Exponential Time to Escape Saddle Points , 2017, NIPS.
[56] Rina Panigrahy,et al. Convergence Results for Neural Networks via Electrodynamics , 2017, ITCS.
[57] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[58] Moritz Hardt,et al. The Noisy Power Method: A Meta Algorithm with Applications , 2013, NIPS.
[59] Ohad Shamir,et al. Distribution-Specific Hardness of Learning Neural Networks , 2016, J. Mach. Learn. Res..
[60] Max Simchowitz,et al. Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.
[61] Jirí Síma,et al. Training a Single Sigmoidal Neuron Is Hard , 2002, Neural Comput..
[62] Ohad Shamir,et al. On the Quality of the Initial Basin in Overspecified Neural Networks , 2015, ICML.
[63] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[64] John Wright,et al. A Geometric Analysis of Phase Retrieval , 2016, International Symposium on Information Theory.
[65] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.