M L ] 4 M ar 2 01 6 Gradient Descent Converges to Minimizers

We show that gradient descent converges to a local minimizer , almost surely with random initialization. This is proved by applying the Stable Manifold Theorem from dynamical systems theory.

[1]  Philip E. Gill,et al.  Newton-type methods for unconstrained and linearly constrained optimization , 1974, Math. Program..

[2]  Michael Shub,et al.  The local theory of normally hyperbolic, invariant, compact manifolds , 1977 .

[3]  Danny C. Sorensen,et al.  On the use of directions of negative curvature in a modified newton method , 1979, Math. Program..

[4]  Katta G. Murty,et al.  Some NP-complete problems in quadratic and nonlinear programming , 1987, Math. Program..

[5]  Robert E. Mahony,et al.  Convergence of the Iterates of Descent Methods for Analytic Cost Functions , 2005, SIAM J. Optim..

[6]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[7]  S. Smale Differentiable dynamical systems , 1967 .

[8]  R. Adler,et al.  Random Fields and Geometry , 2007 .

[9]  Alternating minimization and projection methods for nonconvex problems , 2008 .

[10]  J. Bolte,et al.  Characterizations of Lojasiewicz inequalities: Subgradient flows, talweg, convexity , 2009 .

[11]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[12]  Antonio Auffinger,et al.  Random Matrices and Complexity of Spin Glasses , 2010, 1003.1129.

[13]  Benar Fux Svaiter,et al.  Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods , 2013, Math. Program..

[14]  Yann LeCun,et al.  The Loss Surface of Multilayer Networks , 2014, ArXiv.

[15]  Surya Ganguli,et al.  On the saddle point problem for non-convex optimization , 2014, ArXiv.

[16]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[17]  Xi Chen,et al.  Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..

[18]  Sanjeev Arora,et al.  Simple, Efficient, and Neural Algorithms for Sparse Coding , 2015, COLT.

[19]  Xiaodong Li,et al.  Optimal Rates of Convergence for Noisy Sparse Phase Retrieval via Thresholded Wirtinger Flow , 2015, ArXiv.

[20]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[21]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[22]  John Wright,et al.  A Geometric Analysis of Phase Retrieval , 2016, International Symposium on Information Theory.

[23]  MODELS AND STOCHASTIC APPROXIMATIONS , 2016 .

[24]  John Wright,et al.  Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture , 2015, IEEE Transactions on Information Theory.

[25]  John Wright,et al.  Complete Dictionary Recovery Over the Sphere II: Recovery by Riemannian Trust-Region Method , 2015, IEEE Transactions on Information Theory.

[26]  Yurii Nesterov,et al.  Lectures on Convex Optimization , 2018 .