Proximal Methods Avoid Active Strict Saddles of Weakly Convex Functions

We introduce a geometrically transparent strict saddle property for nonsmooth functions. This property guarantees that simple proximal algorithms on weakly convex problems converge only to local minimizers, when randomly initialized. We argue that the strict saddle property may be a realistic assumption in applications, since it provably holds for generic semi-algebraic optimization problems.

[1]  Adrian S. Lewis,et al.  Identifying Active Manifolds , 2007, Algorithmic Oper. Res..

[2]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[3]  Dmitriy Drusvyatskiy,et al.  Efficiency of minimizing compositions of convex functions and smooth maps , 2016, Math. Program..

[4]  Michael C. Ferris,et al.  Finite termination of the proximal point algorithm , 1991, Math. Program..

[5]  Nicolas Boumal,et al.  Efficiently escaping saddle points on manifolds , 2019, NeurIPS.

[6]  C. Lemaréchal,et al.  THE U -LAGRANGIAN OF A CONVEX FUNCTION , 1996 .

[7]  E. A. Nurminskii The quasigradient method for the solving of the nonlinear programming problems , 1973 .

[8]  Dmitriy Drusvyatskiy,et al.  Error Bounds, Quadratic Growth, and Linear Convergence of Proximal Methods , 2016, Math. Oper. Res..

[9]  John M. Lee Introduction to Smooth Manifolds , 2002 .

[10]  Dmitriy Drusvyatskiy,et al.  Generic Minimizing Behavior in Semialgebraic Optimization , 2015, SIAM J. Optim..

[11]  Benar Fux Svaiter,et al.  Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods , 2013, Math. Program..

[12]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[13]  Maryam Fazel,et al.  Escaping from saddle points on Riemannian manifolds , 2019, NeurIPS.

[14]  J. Kyparisis,et al.  Finite convergence of algorithms for nonlinear programs and variational inequalities , 1991 .

[15]  John Wright,et al.  When Are Nonconvex Problems Not Scary? , 2015, ArXiv.

[16]  R. Rockafellar Favorable Classes of Lipschitz Continuous Functions in Subgradient Optimization , 1981 .

[17]  Stephen J. Wright Identifiable Surfaces in Constrained Optimization , 1993 .

[18]  Dmitriy Drusvyatskiy,et al.  Stochastic model-based minimization of weakly convex functions , 2018, SIAM J. Optim..

[19]  Dmitrii Avdiukhin Escaping Saddle Points with Inequality Constraints via Noisy Sticky Projected Gradient Descent , 2019 .

[20]  J. Burke On identification of active constraints II: the nonconvex case , 1990 .

[21]  Stephen J. Wright,et al.  A proximal method for composite minimization , 2008, Mathematical Programming.

[22]  Michael I. Jordan,et al.  How to Escape Saddle Points Efficiently , 2017, ICML.

[23]  Edouard Pauwels The value function approach to convergence analysis in composite optimization , 2016, Oper. Res. Lett..

[24]  Yi Zheng,et al.  No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.

[25]  M. Shub Global Stability of Dynamical Systems , 1986 .

[26]  Sjur Didrik Flåm,et al.  On finite convergence and constraint identification of subgradient projection methods , 1992, Math. Program..

[27]  Feng Ruan,et al.  Stochastic Methods for Composite and Weakly Convex Optimization Problems , 2017, SIAM J. Optim..

[28]  Adrian S. Lewis,et al.  Partial Smoothness, Tilt Stability, and Generalized Hessians , 2013, SIAM J. Optim..

[29]  Adrian S. Lewis,et al.  Active Sets, Nonsmoothness, and Sensitivity , 2002, SIAM J. Optim..

[30]  Alexander Shapiro,et al.  Second order sensitivity analysis and asymptotic theory of parametrized nonlinear programs , 1985, Math. Program..

[31]  Michael I. Jordan,et al.  Gradient Descent Can Take Exponential Time to Escape Saddle Points , 2017, NIPS.

[32]  J. Lee,et al.  Convergence to Second-Order Stationarity for Constrained Non-Convex Optimization , 2018, 1810.02024.

[33]  Michael I. Jordan,et al.  What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? , 2019, ICML.

[34]  B. Martinet,et al.  R'egularisation d''in'equations variationnelles par approximations successives , 1970 .

[35]  Yurii Nesterov,et al.  Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[36]  Georgios Piliouras,et al.  Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions , 2016, ITCS.

[37]  J. Moreau Proximité et dualité dans un espace hilbertien , 1965 .

[38]  D. Drusvyatskiy The proximal point method revisited , 2017, 1712.06038.

[39]  J. Frédéric Bonnans,et al.  Perturbation Analysis of Optimization Problems , 2000, Springer Series in Operations Research.

[40]  Michael I. Jordan,et al.  First-order methods almost always avoid strict saddle points , 2019, Mathematical Programming.

[41]  Singularities of Semiconcave Functions in Banach Spaces , 1999 .

[42]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[43]  J. Dunn On the convergence of projected gradient processes to singular critical points , 1987 .

[44]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[45]  Dmitriy Drusvyatskiy,et al.  Optimality, identifiability, and sensitivity , 2012, Math. Program..

[46]  Dmitriy Drusvyatskiy,et al.  Low-Rank Matrix Recovery with Composite Optimization: Good Conditioning and Rapid Convergence , 2019, Found. Comput. Math..

[47]  R. Rockafellar,et al.  Prox-regular functions in variational analysis , 1996 .

[48]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[49]  Nathan Srebro,et al.  Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[50]  F. Giannessi Variational Analysis and Generalized Differentiation , 2006 .

[51]  James V. Burke,et al.  Descent methods for composite nondifferentiable optimization problems , 1985, Math. Program..

[52]  Marc Teboulle,et al.  Finding Second-Order Stationary Points in Constrained Minimization: A Feasible Direction Approach , 2020, Journal of Optimization Theory and Applications.

[53]  Michael I. Jordan,et al.  Gradient Descent Only Converges to Minimizers , 2016, COLT.

[54]  Stephen J. Wright,et al.  Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning , 2012, J. Mach. Learn. Res..

[55]  Paul H. Calamai,et al.  Projected gradient methods for linearly constrained problems , 1987, Math. Program..

[56]  Aryan Mokhtari,et al.  Escaping Saddle Points in Constrained Optimization , 2018, NeurIPS.

[57]  Adrian S. Lewis,et al.  Clarke Subgradients of Stratifiable Functions , 2006, SIAM J. Optim..

[58]  H. Gecol,et al.  The Basic Theory , 2007 .

[59]  Yu. S. Ledyaev,et al.  Nonsmooth analysis and control theory , 1998 .

[60]  Yurii Nesterov,et al.  Modified Gauss–Newton scheme with worst case guarantees for global performance , 2007, Optim. Methods Softw..

[61]  John Wright,et al.  A Geometric Analysis of Phase Retrieval , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[62]  J. J. Moré,et al.  On the identification of active constraints , 1988 .

[63]  Bastian Goldlücke,et al.  Variational Analysis , 2014, Computer Vision, A Reference Guide.

[64]  B. Martinet Brève communication. Régularisation d'inéquations variationnelles par approximations successives , 1970 .