Gradient Descent Can Take Exponential Time to Escape Saddle Points
暂无分享,去创建一个
Michael I. Jordan | Barnabás Póczos | Chi Jin | Jason D. Lee | Simon S. Du | Aarti Singh | Chi Jin | S. Du | Aarti Singh | J. Lee | B. Póczos
[1] H. Whitney. Analytic Extensions of Differentiable Functions Defined in Closed Sets , 1934 .
[2] J. Palis,et al. Geometric theory of dynamical systems : an introduction , 1984 .
[3] A. Edelman,et al. Nonnegativity-, monotonicity-, or convexity-preserving cubic and quintic Hermite interpolation , 1989 .
[4] R. Pemantle,et al. Nonconvergence to Unstable Points in Urn Models and Stochastic Approximations , 1990 .
[5] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[6] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[7] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..
[8] Moritz Hardt,et al. Understanding Alternating Minimization for Matrix Completion , 2013, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.
[9] Alan J. Chang. The Whitney extension theorem in high dimensions , 2015, 1508.01779.
[10] Zhi-Quan Luo,et al. Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.
[11] Prateek Jain,et al. Phase Retrieval Using Alternating Minimization , 2013, IEEE Transactions on Signal Processing.
[12] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[13] Xiaodong Li,et al. Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.
[14] Nathan Srebro,et al. Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.
[15] John Wright,et al. A Geometric Analysis of Phase Retrieval , 2016, International Symposium on Information Theory.
[16] Nathan Srebro,et al. Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.
[17] Yair Carmon,et al. Accelerated Methods for Non-Convex Optimization , 2016, SIAM J. Optim..
[18] Kfir Y. Levy,et al. The Power of Normalization: Faster Evasion of Saddle Points , 2016, ArXiv.
[19] Yair Carmon,et al. Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step , 2016, ArXiv.
[20] Michael I. Jordan,et al. Gradient Descent Converges to Minimizers , 2016, ArXiv.
[21] Constantine Caramanis,et al. Fast Algorithms for Robust PCA via Gradient Descent , 2016, NIPS.
[22] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[23] Tengyu Ma,et al. Matrix Completion has No Spurious Local Minimum , 2016, NIPS.
[24] Elad Hazan,et al. Finding Local Minima for Nonconvex Optimization in Linear Time , 2016 .
[25] Junwei Lu,et al. Symmetry, Saddle Points, and Global Geometry of Nonconvex Matrix Factorization , 2016, ArXiv.
[26] Yi Zheng,et al. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.
[27] Xiao Zhang,et al. Stochastic Variance-reduced Gradient Descent for Low-rank Matrix Recovery from Linear Measurements , 2017, 1701.00481.
[28] John Wright,et al. Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture , 2015, IEEE Transactions on Information Theory.
[29] Daniel P. Robinson,et al. A trust region algorithm with a worst-case iteration complexity of O(ϵ-3/2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{docume , 2016, Mathematical Programming.
[30] Prateek Jain,et al. Global Convergence of Non-Convex Gradient Descent for Computing Matrix Squareroot , 2015, AISTATS.
[31] Anastasios Kyrillidis,et al. Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach , 2016, AISTATS.
[32] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[33] Tengyu Ma,et al. Finding approximate local minima faster than gradient descent , 2016, STOC.
[34] Junwei Lu,et al. Symmetry. Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization , 2016, 2018 Information Theory and Applications Workshop (ITA).
[35] Yuandong Tian,et al. When is a Convolutional Filter Easy To Learn? , 2017, ICLR.