An algorithm for quadratic ℓ1-regularized optimization with a flexible active-set strategy

We present an active-set method for minimizing an objective that is the sum of a convex quadratic and regularization term. Unlike two-phase methods that combine a first-order active set identification step and a subspace phase consisting of a cycle of conjugate gradient (CG) iterations, the method presented here has the flexibility of computing a first-order proximal gradient step or a subspace CG step at each iteration. The decision of which type of step to perform is based on the relative magnitudes of some scaled components of the minimum norm subgradient of the objective function. The paper establishes global rates of convergence, as well as work complexity estimates for two variants of our approach, which we call the interleaved iterative soft-thresholding algorithm (ISTA)–CG method. Numerical results illustrating the behaviour of the method on a variety of test problems are presented.

[1]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[2]  D. Bertsekas,et al.  TWO-METRIC PROJECTION METHODS FOR CONSTRAINED OPTIMIZATION* , 1984 .

[3]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[4]  Jean-Yves Audibert Optimization for Machine Learning , 1995 .

[5]  J. Kalivas Two data sets of near infrared spectra , 1997 .

[6]  Stephen J. Wright,et al.  Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .

[7]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[8]  Jorge J. Moré,et al.  Digital Object Identifier (DOI) 10.1007/s101070100263 , 2001 .

[9]  J. Fuchs More on sparse representations in arbitrary bases , 2003 .

[10]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[11]  James Theiler,et al.  Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space , 2003, J. Mach. Learn. Res..

[12]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[13]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[14]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[15]  Joachim Schöberl,et al.  Minimizing Quadratic Functions Subject to Bound Constraints with the Rate of Convergence and Finite Termination , 2005, Comput. Optim. Appl..

[16]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[17]  José M. Bioucas-Dias,et al.  A New TwIST: Two-Step Iterative Shrinkage/Thresholding Algorithms for Image Restoration , 2007, IEEE Transactions on Image Processing.

[18]  Mário A. T. Figueiredo,et al.  Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.

[19]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[20]  Mark W. Schmidt,et al.  Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches , 2007, ECML.

[21]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[22]  Yin Zhang,et al.  Fixed-Point Continuation for l1-Minimization: Methodology and Convergence , 2008, SIAM J. Optim..

[23]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[24]  Stephen J. Wright,et al.  Sparse Reconstruction by Separable Approximation , 2008, IEEE Transactions on Signal Processing.

[25]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[26]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[27]  Yin Zhang,et al.  A Fast Algorithm for Sparse Reconstruction Based on Shrinkage, Subspace Optimization, and Continuation , 2010, SIAM J. Sci. Comput..

[28]  José M. Bioucas-Dias,et al.  Fast Image Recovery Using Variable Splitting and Constrained Optimization , 2009, IEEE Transactions on Image Processing.

[29]  Emmanuel J. Candès,et al.  Templates for convex cone problems with applications to sparse signal recovery , 2010, Math. Program. Comput..

[30]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[31]  Mark W. Schmidt,et al.  Projected Newton-type methods in machine learning , 2011 .

[32]  Jorge Nocedal,et al.  Newton-Like Methods for Sparse Inverse Covariance Estimation , 2012, NIPS.

[33]  J. Nocedal,et al.  An inexact successive quadratic approximation method for L-1 regularized optimization , 2013, Mathematical Programming.

[34]  J. Gondzio,et al.  A Second-Order Method for Strongly Convex L1-Regularization Problems , 2013 .

[35]  Wotao Yin,et al.  Group sparse optimization by alternating direction method , 2013, Optics & Photonics - Optical Engineering + Applications.

[36]  Michael Ulbrich,et al.  A Semismooth Newton Method with Multidimensional Filter Globalization for l1-Optimization , 2014, SIAM J. Optim..

[37]  Jorge Nocedal,et al.  A family of second-order methods for convex $$\ell _1$$ℓ1-regularized optimization , 2016, Math. Program..