Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions

Given a non-convex twice differentiable cost function f , we prove that the set of initial conditions so that gradient descent converges to saddle points where ∇f has at least one strictly negative eigenvalue has (Lebesgue) measure zero, even for cost functions f with non-isolated critical points, answering an open question in [12]. Moreover, this result extends to forward-invariant convex subspaces, allowing for weak (non-globally Lipschitz) smoothness assumptions. Finally, we produce an upper bound on the allowable step-size.

[1]  Danny C. Sorensen,et al.  On the use of directions of negative curvature in a modified newton method , 1979, Math. Program..

[2]  M. Shub Global Stability of Dynamical Systems , 1986 .

[3]  R. Pemantle,et al.  Nonconvergence to Unstable Points in Urn Models and Stochastic Approximations , 1990 .

[4]  L. Perko Differential Equations and Dynamical Systems , 1991 .

[5]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[6]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[7]  A. Ravindran,et al.  Engineering Optimization: Methods and Applications , 2006 .

[8]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, ISIT.

[9]  Éva Tardos,et al.  Multiplicative updates outperform generic no-regret learning in congestion games: extended abstract , 2009, STOC '09.

[10]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[11]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[12]  Umesh Vazirani,et al.  Algorithms, games, and evolution , 2014, Proceedings of the National Academy of Sciences.

[13]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[14]  Xi Chen,et al.  Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..

[15]  Ruta Mehta,et al.  Natural Selection as an Inhibitor of Genetic Diversity: Multiplicative Weights Updates Algorithm and a Conjecture of Haploid Genetics [Working Paper Abstract] , 2014, ITCS.

[16]  Sanjeev Arora,et al.  Simple, Efficient, and Neural Algorithms for Sparse Coding , 2015, COLT.

[17]  David C. Parkes,et al.  On Sex, Evolution, and the Multiplicative Weights Update Algorithm , 2015, AAMAS.

[18]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[19]  Yann LeCun,et al.  The Loss Surfaces of Multilayer Networks , 2014, AISTATS.

[20]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[21]  Yann LeCun,et al.  Singularity of the Hessian in Deep Learning , 2016, ArXiv.

[22]  Michael I. Jordan,et al.  Gradient Descent Converges to Minimizers , 2016, ArXiv.

[23]  Michael I. Jordan,et al.  Gradient Descent Only Converges to Minimizers , 2016, COLT.

[24]  Ruta Mehta,et al.  The Computational Complexity of Genetic Diversity , 2016, ESA.

[25]  Georgios Piliouras,et al.  Average Case Performance of Replicator Dynamics in Potential Games via Computing Regions of Attraction , 2014, EC.

[26]  D. M. V. Hesteren Evolutionary Game Theory , 2017 .

[27]  Ruta Mehta,et al.  Mutation, Sexual Reproduction and Survival in Dynamic Environments , 2015, ITCS.

[28]  John Wright,et al.  Complete Dictionary Recovery Over the Sphere II: Recovery by Riemannian Trust-Region Method , 2015, IEEE Transactions on Information Theory.

[29]  Nitakshi Goyal,et al.  General Topology-I , 2017 .

[30]  M. Spivak Calculus On Manifolds: A Modern Approach To Classical Theorems Of Advanced Calculus , 2019 .