Adaptive first-order methods revisited: Convex optimization without Lipschitz requirements

We propose a new family of adaptive first-order methods for a class of convex minimization problems that may fail to be Lipschitz continuous or smooth in the standard sense. Specifically, motivated by a recent flurry of activity on nonLipschitz (NoLips) optimization, we consider problems that are continuous or smooth relative to a reference Bregman function – as opposed to a global, ambient norm (Euclidean or otherwise). These conditions encompass a wide range of problems with singular objective, such as Fisher markets, Poisson tomography, D-design, and the like. In this setting, the application of existing order-optimal adaptive methods – like UnixGrad or AcceleGrad – is not possible, especially in the presence of randomness and uncertainty. The proposed method, adaptive mirror descent (AdaMir), aims to close this gap by concurrently achieving min-max optimal rates in problems that are relatively continuous or smooth, including stochastic ones.

[1]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[2]  Matthew J. Streeter,et al.  Adaptive Bound Optimization for Online Convex Optimization , 2010, COLT 2010.

[3]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[4]  Volkan Cevher,et al.  Online Adaptive Methods, Universality and Acceleration , 2018, NeurIPS.

[5]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[6]  Tim Roughgarden,et al.  Algorithmic Game Theory , 2007 .

[7]  Yurii Nesterov,et al.  Relatively Smooth Convex Optimization by First-Order Methods, and Applications , 2016, SIAM J. Optim..

[8]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[9]  Peter Richtarik,et al.  Accelerated Bregman proximal gradient methods for relatively smooth convex optimization , 2018, Computational Optimization and Applications.

[10]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[11]  Claudio Gentile,et al.  Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[12]  Stephen P. Boyd,et al.  On the Convergence of Mirror Descent beyond Stochastic Convex Programming , 2017, SIAM J. Optim..

[13]  Alexander Gasnikov,et al.  Inexact model: a framework for optimization and variational inequalities , 2019, Optim. Methods Softw..

[14]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[15]  V. I. Shmyrev,et al.  An algorithm for finding equilibrium in the linear exchange model with fixed budgets , 2009 .

[16]  Francis Bach,et al.  A Universal Algorithm for Variational Inequalities Adaptive to Smoothness and Noise , 2019, COLT.

[17]  Kimon Antonakopoulos,et al.  Adaptive extra-gradient methods for min-max optimization and games , 2020, ICLR.

[18]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[19]  L. Eon Bottou Online Learning and Stochastic Approximations , 1998 .

[20]  Gilles Stoltz,et al.  A second-order bound with excess losses , 2014, COLT.

[21]  Frank Kelly,et al.  Rate control for communication networks: shadow prices, proportional fairness and stability , 1998, J. Oper. Res. Soc..

[22]  Volkan Cevher,et al.  UniXGrad: A Universal, Adaptive Algorithm with Optimal Guarantees for Constrained Optimization , 2019, NeurIPS.

[23]  Marc Teboulle,et al.  A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications , 2017, Math. Oper. Res..

[24]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[25]  Fang Wu,et al.  Proportional response dynamics leads to market equilibrium , 2007, STOC '07.

[26]  Marc Teboulle,et al.  A simplified view of first order methods for optimization , 2018, Math. Program..

[27]  Kimon Antonakopoulos,et al.  Online and stochastic optimization beyond Lipschitz continuity: A Riemannian approach , 2020, ICLR.

[28]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[29]  Alexandre d'Aspremont,et al.  Optimal Complexity and Certification of Bregman First-Order Methods , 2021, Mathematical Programming.

[30]  Marc Teboulle,et al.  Convergence Analysis of a Proximal-Like Minimization Algorithm Using Bregman Functions , 1993, SIAM J. Optim..

[31]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[32]  Mark W. Schmidt,et al.  Regret Bounds without Lipschitz Continuity: Online Learning with Relative-Lipschitz Losses , 2020, NeurIPS.

[33]  A. Juditsky,et al.  Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.

[34]  Haihao Lu “Relative Continuity” for Non-Lipschitz Nonsmooth Convex Optimization Using Stochastic (or Deterministic) Mirror Descent , 2017, INFORMS Journal on Optimization.

[35]  Marc Teboulle,et al.  First Order Methods beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems , 2017, SIAM J. Optim..

[36]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[37]  Xiaoxia Wu,et al.  L ] 1 0 A pr 2 01 9 AdaGrad-Norm convergence over nonconvex landscapes AdaGrad stepsizes : sharp convergence over nonconvex landscapes , from any initialization , 2019 .

[38]  Francesco Orabona,et al.  On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes , 2018, AISTATS.

[39]  Y. Censor,et al.  An iterative row-action method for interval convex programming , 1981 .

[40]  P. L. Combettes,et al.  Quasi-Fejérian Analysis of Some Optimization Algorithms , 2001 .

[41]  Yurii Nesterov,et al.  Universal gradient methods for convex optimization problems , 2015, Math. Program..

[42]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[43]  Nikhil R. Devanur,et al.  Distributed algorithms via gradient descent for fisher markets , 2011, EC '11.

[44]  M. Bertero,et al.  Image deblurring with Poisson data: from cells to galaxies , 2009 .

[45]  Y. Nesterov A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .