\(\ell_{1, p}\)-Norm Regularization: Error Bounds and Convergence Rate Analysis of First-Order Methods

In recent years, the l1,p-regularizer has been widely used to induce structured sparsity in the solutions to various optimization problems. Currently, such l1,p-regularized problems are typically solved by first-order methods. Motivated by the desire to analyze the convergence rates of these methods, we show that for a large class of l1,p-regularized problems, an error bound condition is satisfied when p e 2 [1, 2] or p = ∞ but fails to hold for any p e (2, ∞). Based on this result, we show that many first-order methods enjoy an asymptotic linear rate of convergence when applied to l1,p-regularized linear or logistic regression with p e [1, 2] or p = ∞. By contrast, numerical experiments suggest that for the same class of problems with p e (2, ∞), the aforementioned methods may not converge linearly.

[1]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[2]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[3]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[4]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[7]  Zhi-Quan Luo,et al.  On the linear convergence of the alternating direction method of multipliers , 2012, Mathematical Programming.

[8]  Jun Liu,et al.  Efficient Euclidean projections in linear time , 2009, ICML '09.

[9]  Ryota Tomioka,et al.  Sparsity-accuracy trade-off in MKL , 2010, 1001.2615.

[10]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[11]  Han Liu,et al.  Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery , 2009, ICML '09.

[12]  Yin Zhang,et al.  Fixed-Point Continuation for l1-Minimization: Methodology and Convergence , 2008, SIAM J. Optim..

[13]  S. M. Robinson Some continuity properties of polyhedral multifunctions , 1981 .

[14]  P. Bartlett,et al.  ` p-Norm Multiple Kernel Learning , 2008 .

[15]  G. Minty Monotone (nonlinear) operators in Hilbert space , 1962 .

[16]  Z.-Q. Luo,et al.  Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..

[17]  P. Tseng,et al.  On the linear convergence of descent methods for convex essentially smooth minimization , 1992 .

[18]  Paul Tseng,et al.  Approximation accuracy, gradient methods, and error bound for structured convex optimization , 2010, Math. Program..

[19]  Jun Liu,et al.  Efficient `1=`q Norm Regularization , 2010 .

[20]  Heinz H. Bauschke,et al.  On Projection Algorithms for Solving Convex Feasibility Problems , 1996, SIAM Rev..

[21]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[22]  Volker Roth,et al.  A Complete Analysis of the l_1, p Group-Lasso , 2012, ICML.

[23]  Massimo Fornasier,et al.  Recovery Algorithms for Vector-Valued Data with Joint Sparsity Constraints , 2008, SIAM J. Numer. Anal..

[24]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[25]  Yonina C. Eldar,et al.  Block-Sparse Signals: Uncertainty Relations and Efficient Recovery , 2009, IEEE Transactions on Signal Processing.

[26]  Lin Xiao,et al.  A Proximal-Gradient Homotopy Method for the Sparse Least-Squares Problem , 2012, SIAM J. Optim..

[27]  Jong-Shi Pang,et al.  A Posteriori Error Bounds for the Linearly-Constrained Variational Inequality Problem , 1987, Math. Oper. Res..

[28]  Z. Luo,et al.  On the Linear Convergence of a Proximal Gradient Method for a Class of Nonsmooth Convex Minimization Problems , 2013, Journal of the Operations Research Society of China.

[29]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[30]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[31]  Chih-Jen Lin,et al.  Iteration complexity of feasible descent methods for convex optimization , 2014, J. Mach. Learn. Res..

[32]  R. Rockafellar,et al.  Implicit Functions and Solution Mappings , 2009 .

[33]  G. Minty On the monotonicity of the gradient of a convex function. , 1964 .

[34]  M. Kloft,et al.  l p -Norm Multiple Kernel Learning , 2011 .

[35]  Anthony Man-Cho So,et al.  Non-asymptotic convergence analysis of inexact gradient methods for machine learning without strong convexity , 2013, Optim. Methods Softw..

[36]  M. Kowalski Sparse regression using mixed norms , 2009 .