A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups

We deal with the problem of variable selection when variables must be selected group-wise, with possibly overlapping groups defined a priori. In particular we propose a new optimization procedure for solving the regularized algorithm presented in [12], where the group lasso penalty is generalized to overlapping groups of variables. While in [12] the proposed implementation requires explicit replication of the variables belonging to more than one group, our iterative procedure is based on a combination of proximal methods in the primal space and projected Newton method in a reduced dual space, corresponding to the active groups. This procedure provides a scalable alternative with no need for data duplication, and allows to deal with high dimensional problems without pre-processing for dimensionality reduction. The computational advantages of our scheme with respect to state-of-the-art algorithms using data duplication are shown empirically with numerical simulations.

[1]  Volker Roth,et al.  The Group-Lasso for generalized linear models: uniqueness of solutions and efficient algorithms , 2008, ICML '08.

[2]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[3]  Emmanuel J. Candès,et al.  NESTA: A Fast and Accurate First-Order Method for Sparse Recovery , 2009, SIAM J. Imaging Sci..

[4]  D. Bertsekas Projected Newton methods for optimization problems with simple constraints , 1981, CDC 1981.

[5]  Osman Güler,et al.  New Proximal Point Algorithms for Convex Minimization , 1992, SIAM J. Optim..

[6]  Yang Jing L1 Regularization Path Algorithm for Generalized Linear Models , 2008 .

[7]  I. Loris On the performance of algorithms for the minimization of ℓ1-penalized functionals , 2007, 0710.4082.

[8]  Marc Teboulle,et al.  Fast Gradient-Based Algorithms for Constrained Total Variation Image Denoising and Deblurring Problems , 2009, IEEE Transactions on Image Processing.

[9]  Osman Güer On the convergence of the proximal point algorithm for convex minimization , 1991 .

[10]  Julien Mairal,et al.  Proximal Methods for Sparse Hierarchical Dictionary Learning , 2010, ICML.

[11]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[12]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[13]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[14]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[15]  Y. Nesterov A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .

[16]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[17]  Lorenzo Rosasco,et al.  Iterative Projection Methods for Structured Sparsity Regularization , 2009 .

[18]  Yin Zhang,et al.  Fixed-Point Continuation for l1-Minimization: Methodology and Convergence , 2008, SIAM J. Optim..

[19]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[20]  Francis R. Bach,et al.  High-Dimensional Non-Linear Variable Selection through Hierarchical Kernel Learning , 2009, ArXiv.

[21]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[22]  J. Cullum,et al.  An algorithm for minimizing a differentiable function subject to box constraints and errors , 1979 .

[23]  J. B. Rosen The Gradient Projection Method for Nonlinear Programming. Part I. Linear Constraints , 1960 .

[24]  R. Brayton,et al.  An algorithm for minimizing a differentiable function subject to box constraints , 1976, 1976 IEEE Conference on Decision and Control including the 15th Symposium on Adaptive Processes.

[25]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..