论文信息 - Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions - 字舞流文

Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions

Given a non-convex twice differentiable cost function f , we prove that the set of initial conditions so that gradient descent converges to saddle points where ∇f has at least one strictly negative eigenvalue has (Lebesgue) measure zero, even for cost functions f with non-isolated critical points, answering an open question in [12]. Moreover, this result extends to forward-invariant convex subspaces, allowing for weak (non-globally Lipschitz) smoothness assumptions. Finally, we produce an upper bound on the allowable step-size.

Georgios Piliouras | Ioannis Panageas | Ioannis Panageas | G. Piliouras

[1] Danny C. Sorensen,et al. On the use of directions of negative curvature in a modified newton method , 1979, Math. Program..

[2] M. Shub. Global Stability of Dynamical Systems , 1986 .

[3] R. Pemantle,et al. Nonconvergence to Unstable Points in Urn Models and Stochastic Approximations , 1990 .

[4] L. Perko. Differential Equations and Dynamical Systems , 1991 .

[5] H. Sebastian Seung,et al. Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[6] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[7] A. Ravindran,et al. Engineering Optimization: Methods and Applications , 2006 .

[8] Andrea Montanari,et al. Matrix completion from a few entries , 2009, ISIT.

[9] Éva Tardos,et al. Multiplicative updates outperform generic no-regret learning in congestion games: extended abstract , 2009, STOC '09.

[10] Andrea Montanari,et al. Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[11] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[12] Umesh Vazirani,et al. Algorithms, games, and evolution , 2014, Proceedings of the National Academy of Sciences.

[13] Andreas Krause,et al. Advances in Neural Information Processing Systems (NIPS) , 2014 .

[14] Xi Chen,et al. Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..

[15] Ruta Mehta,et al. Natural Selection as an Inhibitor of Genetic Diversity: Multiplicative Weights Updates Algorithm and a Conjecture of Haploid Genetics [Working Paper Abstract] , 2014, ITCS.

[16] Sanjeev Arora,et al. Simple, Efficient, and Neural Algorithms for Sparse Coding , 2015, COLT.

[17] David C. Parkes,et al. On Sex, Evolution, and the Multiplicative Weights Update Algorithm , 2015, AAMAS.

[18] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[19] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.

[20] Xiaodong Li,et al. Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[21] Yann LeCun,et al. Singularity of the Hessian in Deep Learning , 2016, ArXiv.

[22] Michael I. Jordan,et al. Gradient Descent Converges to Minimizers , 2016, ArXiv.

[23] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.

[24] Ruta Mehta,et al. The Computational Complexity of Genetic Diversity , 2016, ESA.

[25] Georgios Piliouras,et al. Average Case Performance of Replicator Dynamics in Potential Games via Computing Regions of Attraction , 2014, EC.

[26] D. M. V. Hesteren. Evolutionary Game Theory , 2017 .

[27] Ruta Mehta,et al. Mutation, Sexual Reproduction and Survival in Dynamic Environments , 2015, ITCS.

[28] John Wright,et al. Complete Dictionary Recovery Over the Sphere II: Recovery by Riemannian Trust-Region Method , 2015, IEEE Transactions on Information Theory.

[29] Nitakshi Goyal,et al. General Topology-I , 2017 .

[30] M. Spivak. Calculus On Manifolds: A Modern Approach To Classical Theorems Of Advanced Calculus , 2019 .