Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization
暂无分享,去创建一个
Tuo Zhao | Enlu Zhou | Yan Li | Tianyi Liu | Song Wei | T. Zhao | Enlu Zhou | Tianyi Liu | Yan Li | Song Wei | S. Wei
[1] Colin Wei,et al. Shape Matters: Understanding the Implicit Bias of the Noise Covariance , 2020, COLT.
[2] Joan Bruna,et al. Spurious Valleys in Two-layer Neural Network Optimization Landscapes , 2018, 1802.06384.
[3] Quynh Nguyen,et al. On Connected Sublevel Sets in Deep Learning , 2019, ICML.
[4] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[5] Songtao Lu,et al. PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization , 2019, ICML.
[6] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[7] Michael I. Jordan,et al. First-order methods almost always avoid strict saddle points , 2019, Mathematical Programming.
[8] Sanjeev Arora,et al. Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets , 2019, NeurIPS.
[9] Michael I. Jordan. Graphical Models , 2003 .
[10] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[11] Colin Wei,et al. Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks , 2019, NeurIPS.
[12] Matus Telgarsky,et al. Gradient descent aligns the layers of deep linear networks , 2018, ICLR.
[13] Tengyu Ma,et al. Matrix Completion has No Spurious Local Minimum , 2016, NIPS.
[14] Martin J. Wainwright,et al. Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees , 2015, ArXiv.
[15] Matthias Hein,et al. The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.
[16] Andrea Montanari,et al. Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.
[17] Nathan Srebro,et al. Implicit Bias of Gradient Descent on Linear Convolutional Networks , 2018, NeurIPS.
[18] Anastasios Kyrillidis,et al. Dropping Convexity for Faster Semi-definite Optimization , 2015, COLT.
[19] Matthias Hein,et al. The loss surface and expressivity of deep convolutional neural networks , 2017, ICLR.
[20] Tuo Zhao,et al. Towards Understanding the Importance of Noise in Training Neural Networks , 2019, ICML 2019.
[21] Andrew Gordon Wilson,et al. Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs , 2018, NeurIPS.
[22] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[23] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[24] Yi Zheng,et al. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.
[25] Martin J. Wainwright,et al. Randomized Smoothing for Stochastic Optimization , 2011, SIAM J. Optim..
[26] Zhaoran Wang,et al. A Nonconvex Optimization Framework for Low Rank Matrix Estimation , 2015, NIPS.
[27] R. Srikant,et al. Understanding the Loss Surface of Neural Networks for Binary Classification , 2018, ICML.
[28] Wei Hu,et al. Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced , 2018, NeurIPS.
[29] Fred A. Hamprecht,et al. Essentially No Barriers in Neural Network Energy Landscape , 2018, ICML.
[30] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[31] Moritz Hardt,et al. Understanding Alternating Minimization for Matrix Completion , 2013, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.
[32] Hagai Attias,et al. A Variational Bayesian Framework for Graphical Models , 1999 .
[33] Yuxin Chen,et al. Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview , 2018, IEEE Transactions on Signal Processing.
[34] John D. Lafferty,et al. Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent , 2016, ArXiv.
[35] Guy Blanc,et al. Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process , 2019, COLT.
[36] Geoffrey E. Hinton,et al. Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.
[37] Junwei Lu,et al. Symmetry. Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization , 2016, 2018 Information Theory and Applications Workshop (ITA).
[38] Max Simchowitz,et al. Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.
[39] Ping Luo,et al. Towards Understanding Regularization in Batch Normalization , 2018, ICLR.