Variational Analysis Perspective on Linear Convergence of Some First Order Methods for Nonsmooth Convex Optimization Problems

We study linear convergence of some first-order methods such as the proximal gradient method (PGM), the proximal alternating linearized minimization (PALM) algorithm and the randomized block coordinate proximal gradient method (R-BCPGM) for minimizing the sum of a smooth convex function and a nonsmooth convex function. We introduce a new analytic framework based on the error bound/calmness/metric subregularity/bounded metric subregularity. This variational analysis perspective enables us to provide some concrete sufficient conditions for linear convergence and applicable approaches for calculating linear convergence rates of these first-order methods for a class of structured convex problems. In particular, for the LASSO, the fused LASSO and the group LASSO, these conditions are satisfied automatically, and the modulus for the calmness/metric subregularity is computable. Consequently, the linear convergence of the first-order methods for these important applications is automatically guaranteed and the convergence rates can be calculated. The new perspective enables us to improve some existing results and obtain novel results unknown in the literature. Particularly, we improve the result on the linear convergence of the PGM and PALM for the structured convex problem with a computable error bound estimation. Also for the R-BCPGM for the structured convex problem, we prove that the linear convergence is ensured when the nonsmooth part of the objective function is the group LASSO regularizer.

[1]  Mark W. Schmidt,et al.  Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization , 2011, NIPS.

[2]  Z. Luo,et al.  On the Linear Convergence of a Proximal Gradient Method for a Class of Nonsmooth Convex Minimization Problems , 2013 .

[3]  Shuzhong Zhang,et al.  Global Error Bounds for Convex Conic Problems , 1998, SIAM J. Optim..

[4]  Hui Zhang New analysis of linear convergence of gradient-type methods via unifying error bound conditions , 2020, Math. Program..

[5]  Helmut Gfrerer,et al.  JOHANNES KEPLER UNIVERSITY LINZ Institute of Computational Mathematics On Directional Metric Regularity, Subregularity and Optimality Conditions for Nonsmooth Mathematical Programs , 2012 .

[6]  Amir Beck,et al.  First-Order Methods in Optimization , 2017 .

[7]  Guoyin Li,et al.  Calculus of the Exponent of Kurdyka–Łojasiewicz Inequality and Its Applications to Linear Convergence of First-Order Methods , 2016, Foundations of Computational Mathematics.

[8]  Z.-Q. Luo,et al.  Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..

[9]  D. Russell Luke,et al.  Quantitative Convergence Analysis of Iterated Expansive, Set-Valued Mappings , 2016, Math. Oper. Res..

[10]  B. Martinet Brève communication. Régularisation d'inéquations variationnelles par approximations successives , 1970 .

[11]  Zhi-Quan Luo,et al.  Iteration complexity analysis of block coordinate descent methods , 2013, Mathematical Programming.

[12]  Diethard Klatte,et al.  Error bounds for solutions of linear equations and inequalities , 1995, Math. Methods Oper. Res..

[13]  René Henrion,et al.  On the Calmness of a Class of Multifunctions , 2002, SIAM J. Optim..

[14]  Chih-Jen Lin,et al.  Iteration complexity of feasible descent methods for convex optimization , 2014, J. Mach. Learn. Res..

[15]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[16]  Bernhard Schölkopf,et al.  Causal Discovery from Heterogeneous/Nonstationary Data , 2019, J. Mach. Learn. Res..

[17]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[18]  Juan C. Vera,et al.  An algorithm to compute the Hoffman constant of a system of linear constraints , 2018, 1804.08418.

[19]  F. J. A. Artacho,et al.  Characterization of Metric Regularity of Subdifferentials , 2008 .

[20]  Jean-Pierre Aubin,et al.  Lipschitz Behavior of Solutions to Convex Minimization Problems , 1984, Math. Oper. Res..

[21]  Jane J. Ye,et al.  Perturbation Techniques for Convergence Analysis of Proximal Gradient Method and Other First-Order Algorithms via Variational Analysis , 2018, Set-Valued and Variational Analysis.

[22]  S. M. Robinson Stability Theory for Systems of Inequalities, Part II: Differentiable Nonlinear Systems , 1976 .

[23]  Ion Necoara,et al.  Efficient parallel coordinate descent algorithm for convex optimization problems with separable constraints: Application to distributed MPC , 2013, 1302.3092.

[24]  R. Tibshirani,et al.  A note on the group lasso and a sparse group lasso , 2010, 1001.0736.

[25]  S. M. Robinson An Implicit-Function Theorem for Generalized Variational Inequalities. , 1976 .

[26]  Diethard Klatte,et al.  Constrained Minima and Lipschitzian Penalties in Metric Spaces , 2002, SIAM J. Optim..

[27]  Yurii Nesterov,et al.  Linear convergence of first order methods for non-strongly convex optimization , 2015, Math. Program..

[28]  F. J. A. Artacho,et al.  Metric subregularity of the convex subdifferential in Banach spaces , 2013, 1303.3654.

[29]  Asen L. Dontchev,et al.  Regularity and Conditioning of Solution Mappings in Variational Analysis , 2004 .

[30]  F. Facchinei,et al.  Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .

[31]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[32]  Xi Yin Zheng,et al.  Metric Subregularity of Piecewise Linear Multifunctions and Applications to Piecewise Linear Multiobjective Optimization , 2014, SIAM J. Optim..

[33]  Martin J. Wainwright,et al.  Fast global convergence of gradient methods for high-dimensional statistical recovery , 2011, ArXiv.

[34]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[35]  Marc Teboulle,et al.  On the rate of convergence of the proximal alternating linearized minimization algorithm for convex problems , 2016, EURO J. Comput. Optim..

[36]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[37]  S. M. Robinson Stability Theory for Systems of Inequalities. Part I: Linear Systems , 1975 .

[38]  Yurii Nesterov,et al.  Random Block Coordinate Descent Methods for Linearly Constrained Optimization over Networks , 2015, J. Optim. Theory Appl..

[39]  Anthony Man-Cho So,et al.  A unified approach to error bounds for structured convex optimization problems , 2015, Mathematical Programming.

[40]  Peter Richtárik,et al.  Optimization in High Dimensions via Accelerated, Parallel, and Proximal Coordinate Descent , 2016, SIAM Rev..

[41]  Uriel G. Rothblum,et al.  Approximations to Solutions to Systems of Linear Inequalities , 1995, SIAM J. Matrix Anal. Appl..

[42]  Jane J. Ye,et al.  Verifiable sufficient conditions for the error bound property of second-order cone complementarity problems , 2017, Mathematical Programming.

[43]  Lei Guo,et al.  Mathematical Programs with Geometric Constraints in Banach Spaces: Enhanced Optimality, Exact Penalty, and Sensitivity , 2013, SIAM J. Optim..

[44]  Bruce W. Suter,et al.  From error bounds to the complexity of first-order descent methods for convex functions , 2015, Math. Program..

[45]  René Henrion,et al.  Calmness of constraint systems with applications , 2005, Math. Program..

[46]  P. Tseng,et al.  On the linear convergence of descent methods for convex essentially smooth minimization , 1992 .

[47]  Tuo Zhao,et al.  An Improved Convergence Analysis of Cyclic Block Coordinate Descent-type Methods for Strongly Convex Minimization , 2016, AISTATS.

[48]  Qi Zhang,et al.  \(\ell_{1, p}\)-Norm Regularization: Error Bounds and Convergence Rate Analysis of First-Order Methods , 2015, ICML.

[49]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[50]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[51]  Paul Tseng,et al.  Approximation accuracy, gradient methods, and error bound for structured convex optimization , 2010, Math. Program..

[52]  Dmitriy Drusvyatskiy,et al.  Error Bounds, Quadratic Growth, and Linear Convergence of Proximal Methods , 2016, Math. Oper. Res..

[53]  Marc Teboulle,et al.  Proximal alternating linearized minimization for nonconvex and nonsmooth problems , 2013, Mathematical Programming.

[54]  A. Hoffman On approximate solutions of systems of linear inequalities , 1952 .

[55]  Luis Zuluaga,et al.  New characterizations of Hoffman constants for systems of linear constraints , 2019, Mathematical Programming.

[56]  J. J. Ye,et al.  Necessary Optimality Conditions for Optimization Problems with Variational Inequality Constraints , 1997, Math. Oper. Res..

[57]  Lin Xiao,et al.  A Proximal-Gradient Homotopy Method for the Sparse Least-Squares Problem , 2012, SIAM J. Optim..

[58]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[59]  Shaohua Pan,et al.  Several Classes of Stationary Points for Rank Regularized Minimization Problems , 2019, SIAM J. Optim..

[60]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[61]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[62]  S. M. Robinson Some continuity properties of polyhedral multifunctions , 1981 .

[63]  Emmanuel J. Candès,et al.  Adaptive Restart for Accelerated Gradient Schemes , 2012, Foundations of Computational Mathematics.

[64]  Ion Necoara,et al.  Parallel Random Coordinate Descent Method for Composite Minimization: Convergence Analysis and Error Bounds , 2016, SIAM J. Optim..

[65]  Jin Zhang,et al.  Discerning the Linear Convergence of ADMM for Structured Convex Optimization through the Lens of Variational Analysis , 2020, J. Mach. Learn. Res..

[66]  Helmut Gfrerer,et al.  New Constraint Qualifications for Mathematical Programs with Equilibrium Constraints via Variational Analysis , 2022 .

[67]  J. Stoer,et al.  Convexity and Optimization in Finite Dimensions I , 1970 .

[68]  HongMingyi,et al.  Iteration complexity analysis of block coordinate descent methods , 2017 .

[69]  H. Bondell,et al.  Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR , 2008, Biometrics.

[70]  Boris Polyak,et al.  B.S. Mordukhovich. Variational Analysis and Generalized Differentiation. I. Basic Theory, II. Applications , 2009 .

[71]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[72]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[73]  Gregory B. Passty Ergodic convergence to a zero of the sum of monotone operators in Hilbert space , 1979 .