Accelerated Bregman proximal gradient methods for relatively smooth convex optimization

We consider the problem of minimizing the sum of two convex functions: one is differentiable and relatively smooth with respect to a reference convex function, and the other can be nondifferentiable but simple to optimize. We investigate a triangle scaling property of the Bregman distance generated by the reference convex function and present accelerated Bregman proximal gradient (ABPG) methods that attain an $O(k^{-\gamma})$ convergence rate, where $\gamma\in(0,2]$ is the triangle scaling exponent (TSE) of the Bregman distance. For the Euclidean distance, we have $\gamma=2$ and recover the convergence rate of Nesterov's accelerated gradient methods. For non-Euclidean Bregman distances, the TSE can be much smaller (say $\gamma\leq 1$), but we show that a relaxed definition of intrinsic TSE is always equal to 2. We exploit the intrinsic TSE to develop adaptive ABPG methods that converge much faster in practice. Although theoretical guarantees on a fast convergence rate seem to be out of reach in general, our methods obtain empirical $O(k^{-2})$ rates in numerical experiments on several applications and provide posterior numerical certificates for the fast rates.

[1]  I. Csiszár Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems , 1991 .

[2]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[3]  Emmanuel J. Candès,et al.  Adaptive Restart for Accelerated Gradient Schemes , 2012, Foundations of Computational Mathematics.

[4]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[5]  Nikhil R. Devanur,et al.  Distributed algorithms via gradient descent for fisher markets , 2011, EC '11.

[6]  M. Bertero,et al.  Image deblurring with Poisson data: from cells to galaxies , 2009 .

[7]  R. Jackson Inequalities , 2007, Algebra for Parents.

[8]  Marc Teboulle,et al.  Convergence Analysis of a Proximal-Like Minimization Algorithm Using Bregman Functions , 1993, SIAM J. Optim..

[9]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[10]  Corwin L. Atwood,et al.  Optimal and Efficient Designs of Experiments , 1969 .

[11]  Haihao Lu “Relative Continuity” for Non-Lipschitz Nonsmooth Convex Optimization Using Stochastic (or Deterministic) Mirror Descent , 2017, INFORMS Journal on Optimization.

[12]  Heinz H. Bauschke,et al.  Joint and Separate Convexity of the Bregman Distance , 2001 .

[13]  Marc Teboulle,et al.  Interior Gradient and Proximal Methods for Convex and Conic Optimization , 2006, SIAM J. Optim..

[14]  Yurii Nesterov,et al.  Implementable tensor methods in unconstrained convex optimization , 2019, Mathematical Programming.

[15]  Marc Teboulle,et al.  A simplified view of first order methods for optimization , 2018, Math. Program..

[16]  Yi Zhou,et al.  A simple convergence analysis of Bregman proximal gradient algorithm , 2015, Computational Optimization and Applications.

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Amir Beck,et al.  First-Order Methods in Optimization , 2017 .

[19]  Marc Teboulle,et al.  A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications , 2017, Math. Oper. Res..

[20]  Y. Censor,et al.  An iterative row-action method for interval convex programming , 1981 .

[21]  Yurii Nesterov,et al.  Universal gradient methods for convex optimization problems , 2015, Math. Program..

[22]  Yi Zhou,et al.  A Unified Approach to Proximal Algorithms using Bregman Distance , 2016 .

[23]  Y. Censor,et al.  Proximal minimization algorithm withD-functions , 1992 .

[24]  Yurii Nesterov,et al.  Relatively Smooth Convex Optimization by First-Order Methods, and Applications , 2016, SIAM J. Optim..

[25]  Peter Richtárik,et al.  Fastest rates for stochastic mirror descent methods , 2018, Computational Optimization and Applications.

[26]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[27]  David H. Gutman,et al.  Perturbed Fenchel duality and first-order methods , 2018, Mathematical Programming.

[28]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[29]  Yurii Nesterov,et al.  Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[30]  J. Kiefer,et al.  Optimum Designs in Regression Problems , 1959 .

[31]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[32]  Alexandre d'Aspremont,et al.  Optimal Complexity and Certification of Bregman First-Order Methods , 2021, Mathematical Programming.