Efficient Mixed-Norm Regularization: Algorithms and Safe Screening Methods

Sparse learning has recently received increasing attention in many areas including machine learning, statistics, and applied mathematics. The mixed-norm regularization based on the l1/lq norm with q > 1 is attractive in many applications of regression and classification in that it facilitates group sparsity in the model. The resulting optimization problem is, however, challenging to solve due to the inherent structure of the l1/lq-regularization. Existing work deals with special cases including q = 2,∞, and they can not be easily extended to the general case. In this paper, we propose an efficient algorithm based on the accelerated gradient method for solving the l1/lq-regularized problem, which is applicable for all values of q larger than 1, thus significantly extending existing work. One key building block of the proposed algorithm is the l1/lq-regularized Euclidean projection (EP1q). Our theoretical analysis reveals the key properties of EP1q and illustrates why EP1q for the general q is significantly more challenging to solve than the special cases. Based on our theoretical analysis, we develop an efficient algorithm for EP1q by solving two zero finding problems. To further improve the efficiency of solving large dimensional l1/lq regularized problems, we propose an efficient and effective “screening” method which is able to quickly identify the inactive groups, i.e., groups that have 0 components in the solution. This may lead to substantial reduction in the number of groups to be entered to the optimization. An appealing feature of our screening method is that the data set needs to be scanned only once to run the screening. Compared to that of solving the l1/lq-regularized problems, the computational cost of our screening test is negligible. The key of the proposed screening method is an accurate sensitivity analysis of the dual optimal solution when the regularization parameter varies. Experimental results demonstrate the efficiency of the proposed algorithm.

[1]  Yonina C. Eldar,et al.  Average Case Analysis of Multichannel Sparse Recovery Using Convex Relaxation , 2009, IEEE Transactions on Information Theory.

[2]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[3]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[4]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[5]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[6]  Arkadi Nemirovski,et al.  EFFICIENT METHODS IN CONVEX PROGRAMMING , 2007 .

[7]  Mark W. Schmidt,et al.  GROUP SPARSITY VIA LINEAR-TIME PROJECTION , 2008 .

[8]  Paul M. Thompson,et al.  Genetics of the connectome , 2013, NeuroImage.

[9]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[10]  Osman Güler,et al.  Foundations of Optimization , 2010 .

[11]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[12]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[13]  Jun Liu,et al.  Efficient Euclidean projections in linear time , 2009, ICML '09.

[14]  R. Tibshirani,et al.  Strong rules for discarding predictors in lasso‐type problems , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[15]  Jie Wang,et al.  Lasso screening rules via dual polytope projection , 2012, J. Mach. Learn. Res..

[16]  Suvrit Sra,et al.  Fast projections onto mixed-norm balls with applications , 2012, Data Mining and Knowledge Discovery.

[17]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[18]  Yoram Singer,et al.  Boosting with structural sparsity , 2009, ICML '09.

[19]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[20]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[21]  J. Moreau Proximité et dualité dans un espace hilbertien , 1965 .

[22]  Trevor Darrell,et al.  An efficient projection for l 1 , infinity regularization. , 2009, ICML 2009.

[23]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[24]  Han Liu,et al.  Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery , 2009, ICML '09.

[25]  Yin Zhang,et al.  Fixed-Point Continuation for l1-Minimization: Methodology and Convergence , 2008, SIAM J. Optim..

[26]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[27]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[28]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[29]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[30]  Shuiwang Ji,et al.  SLEP: Sparse Learning with Efficient Projections , 2011 .

[31]  M. Kowalski Sparse regression using mixed norms , 2009 .

[32]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[33]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[34]  M. Wainwright,et al.  Joint support recovery under high-dimensional scaling: Benefits and perils of ℓ 1,∞ -regularization , 2008, NIPS 2008.

[35]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[36]  R. Tibshirani,et al.  A note on the group lasso and a sparse group lasso , 2010, 1001.0736.

[37]  Volker Roth,et al.  A Complete Analysis of the l_1, p Group-Lasso , 2012, ICML.

[38]  W. Marsden I and J , 2012 .

[39]  Wotao Yin,et al.  A Fast Hybrid Algorithm for Large-Scale l1-Regularized Logistic Regression , 2010, J. Mach. Learn. Res..

[40]  P. Zhao Boosted Lasso , 2004 .

[41]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[42]  Laurent El Ghaoui,et al.  Safe Feature Elimination in Sparse Supervised Learning , 2010, ArXiv.

[43]  D. Bertsekas 6.253 Convex Analysis and Optimization, Spring 2010 , 2004 .

[44]  J. Hiriart-Urruty,et al.  Convex analysis and minimization algorithms , 1993 .

[45]  G. Obozinski Joint covariate selection for grouped classification , 2007 .