论文信息 - Escaping strict saddle points of the Moreau envelope in nonsmooth optimization - 字舞流文

Escaping strict saddle points of the Moreau envelope in nonsmooth optimization

Recent work has shown that stochastically perturbed gradient methods can efficiently escape strict saddle points of smooth functions. We extend this body of work to nonsmooth optimization, by analyzing an inexact analogue of a stochastically perturbed gradient method applied to the Moreau envelope. The main conclusion is that a variety of algorithms for nonsmooth optimization can escape strict saddle points of the Moreau envelope at a controlled rate. The main technical insight is that typical algorithms applied to the proximal subproblem yield directions that approximate the gradient of the Moreau envelope in relative terms.

Dmitriy Drusvyatskiy | Damek Davis | Mateo D'iaz | Damek Davis | D. Drusvyatskiy | M. D'iaz | Mateo Díaz

[1] Nicholas J. A. Harvey,et al. Tight Analyses for Non-Smooth Stochastic Gradient Descent , 2018, COLT.

[2] Claude Lemaréchal,et al. Practical Aspects of the Moreau-Yosida Regularization: Theoretical Preliminaries , 1997, SIAM J. Optim..

[3] Stephen J. Wright,et al. A Line-Search Descent Algorithm for Strict Saddle Functions with Complexity Guarantees , 2020, J. Mach. Learn. Res..

[4] Stephen J. Wright,et al. Trust-Region Newton-CG with Strong Second-Order Complexity Guarantees for Nonconvex Optimization , 2019, SIAM J. Optim..

[5] Yair Carmon,et al. Accelerated Methods for NonConvex Optimization , 2018, SIAM J. Optim..

[6] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .

[7] Stephen J. Wright,et al. Complexity Analysis of Second-Order Line-Search Algorithms for Smooth Nonconvex Optimization , 2017, SIAM J. Optim..

[8] Bastian Goldlücke,et al. Variational Analysis , 2014, Computer Vision, A Reference Guide.

[9] Dmitriy Drusvyatskiy,et al. Proximal Methods Avoid Active Strict Saddles of Weakly Convex Functions , 2021, Found. Comput. Math..

[10] R. Pemantle,et al. Nonconvergence to Unstable Points in Urn Models and Stochastic Approximations , 1990 .

[11] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[12] Stephen J. Wright,et al. A log-barrier Newton-CG method for bound constrained optimization with complexity guarantees , 2019, IMA Journal of Numerical Analysis.

[13] Tengyu Ma,et al. Finding approximate local minima faster than gradient descent , 2016, STOC.

[14] Ambuj Tewari,et al. On the Generalization Ability of Online Strongly Convex Programming Algorithms , 2008, NIPS.

[15] Kaizheng Wang,et al. Efficient Clustering for Stretched Mixtures: Landscape and Optimality , 2020, NeurIPS.

[16] John Wright,et al. A Geometric Analysis of Phase Retrieval , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[17] Stephen J. Wright,et al. A Newton-CG algorithm with complexity guarantees for smooth unconstrained optimization , 2018, Mathematical Programming.

[18] Maryam Fazel,et al. Escaping from saddle points on Riemannian manifolds , 2019, NeurIPS.

[19] Stephen J. Wright,et al. Complexity of Projected Newton Methods for Bound-constrained Optimization , 2021 .

[20] Yi Zheng,et al. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.

[21] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[22] Minhui Huang,et al. Escaping Saddle Points for Nonsmooth Weakly Convex Functions via Perturbed Proximal Algorithms , 2021, ArXiv.

[23] Adrian S. Lewis,et al. Convex Analysis And Nonlinear Optimization , 2000 .

[24] Marc Teboulle,et al. Finding Second-Order Stationary Points in Constrained Minimization: A Feasible Direction Approach , 2020, Journal of Optimization Theory and Applications.

[25] Nicholas J. A. Harvey,et al. Simple and optimal high-probability bounds for strongly-convex stochastic gradient descent , 2019, ArXiv.

[26] G. J. O. Jameson,et al. Inequalities for Gamma Function Ratios , 2013, Am. Math. Mon..

[27] Nicolas Boumal,et al. Adaptive regularization with cubics on manifolds , 2018, Mathematical Programming.

[28] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.

[29] Tengyu Ma,et al. Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[30] Nathan Srebro,et al. Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[31] Michael I. Jordan,et al. On the Local Minima of the Empirical Risk , 2018, NeurIPS.

[32] Dmitriy Drusvyatskiy,et al. Generic Minimizing Behavior in Semialgebraic Optimization , 2015, SIAM J. Optim..

[33] Michael I. Jordan,et al. Gradient Descent Can Take Exponential Time to Escape Saddle Points , 2017, NIPS.

[34] J. Lee,et al. Convergence to Second-Order Stationarity for Constrained Non-Convex Optimization , 2018, 1810.02024.

[35] Michael I. Jordan,et al. First-order methods almost always avoid strict saddle points , 2019, Mathematical Programming.

[36] Dmitriy Drusvyatskiy,et al. Stochastic model-based minimization of weakly convex functions , 2018, SIAM J. Optim..

[37] Songtao Lu,et al. Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems , 2020, NeurIPS.

[38] Michael I. Jordan,et al. On Nonconvex Optimization for Machine Learning , 2019, J. ACM.

[39] Thomas Hofmann,et al. Escaping Saddles with Stochastic Gradients , 2018, ICML.

[40] Nicolas Boumal,et al. Efficiently escaping saddle points on manifolds , 2019, NeurIPS.

[41] Aryan Mokhtari,et al. Escaping Saddle Points in Constrained Optimization , 2018, NeurIPS.

[42] R. Fletcher. A model algorithm for composite nondifferentiable optimization problems , 1982 .

[43] Michael I. Jordan,et al. Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.

[44] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[45] Zhihui Zhu,et al. The Global Optimization Geometry of Low-Rank Matrix Optimization , 2017, IEEE Transactions on Information Theory.

[46] John Wright,et al. When Are Nonconvex Problems Not Scary? , 2015, ArXiv.