Convergence of adaptive algorithms for weakly convex constrained optimization

We analyze the adaptive first order algorithm AMSGrad, for solving a constrained stochastic optimization problem with a weakly convex objective. We prove the $\mathcal{\tilde O}(t^{-1/4})$ rate of convergence for the norm of the gradient of Moreau envelope, which is the standard stationarity measure for this class of problems. It matches the known rates that adaptive algorithms enjoy for the specific case of unconstrained smooth stochastic optimization. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly unbounded optimization domains. Finally, we illustrate the applications and extensions of our results to specific problems and algorithms.

[1]  Olivier Pietquin,et al.  On Connections between Constrained Optimization and Reinforcement Learning , 2019, ArXiv.

[2]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[3]  Xiaoxia Wu,et al.  AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization , 2018, ICML.

[4]  Li Shen,et al.  A Sufficient Condition for Convergences of Adam and RMSProp , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[6]  Logan Engstrom,et al.  Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[7]  Jinghui Chen,et al.  Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks , 2018, IJCAI.

[8]  David E. Cox,et al.  ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization , 2019, NeurIPS.

[9]  Dmitriy Drusvyatskiy,et al.  Stochastic Subgradient Method Converges on Tame Functions , 2018, Foundations of Computational Mathematics.

[10]  Anas Barakat,et al.  Convergence Analysis of a Momentum Algorithm with Adaptive Step Size for Non Convex Optimization , 2019, ArXiv.

[11]  Dmitriy Drusvyatskiy,et al.  Efficiency of minimizing compositions of convex functions and smooth maps , 2016, Math. Program..

[12]  Bastian Goldlücke,et al.  Variational Analysis , 2014, Computer Vision, A Reference Guide.

[13]  Dmitriy Drusvyatskiy,et al.  Stochastic model-based minimization of weakly convex functions , 2018, SIAM J. Optim..

[14]  Feng Ruan,et al.  Stochastic Methods for Composite and Weakly Convex Optimization Problems , 2017, SIAM J. Optim..

[15]  Francis Bach,et al.  On the Convergence of Adam and Adagrad , 2020, ArXiv.

[16]  Li Shen,et al.  Weighted AdaGrad with Unified Momentum , 2018 .

[17]  Mikael Johansson,et al.  Convergence of a Stochastic Gradient Method with Momentum for Nonsmooth Nonconvex Optimization , 2020, ICML.

[18]  Kfir Y. Levy,et al.  Online to Offline Conversions, Universality and Adaptive Minibatch Sizes , 2017, NIPS.

[19]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[20]  Suvrit Sra,et al.  On Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions , 2020, ArXiv.

[21]  Dmitriy Drusvyatskiy,et al.  Error Bounds, Quadratic Growth, and Linear Convergence of Proximal Methods , 2016, Math. Oper. Res..

[22]  Damek Davis,et al.  Proximally Guided Stochastic Subgradient Method for Nonsmooth, Nonconvex Problems , 2017, SIAM J. Optim..

[23]  Mingyi Hong,et al.  On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization , 2018, ICLR.

[24]  Pascal Fua,et al.  Imposing Hard Constraints on Deep Networks: Promises and Limitations , 2017, CVPR 2017.

[25]  Volkan Cevher,et al.  A new regret analysis for Adam-type algorithms , 2020, ICML.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[28]  Volkan Cevher,et al.  Online Adaptive Methods, Universality and Acceleration , 2018, NeurIPS.

[29]  Saeed Ghadimi,et al.  Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.

[30]  Francesco Orabona,et al.  On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes , 2018, AISTATS.