Linearized alternating direction method of multipliers for sparse group and fused LASSO models

The least absolute shrinkage and selection operator (LASSO) has been playing an important role in variable selection and dimensionality reduction for linear regression. In this paper we focus on two general LASSO models: Sparse Group LASSO and Fused LASSO, and apply the linearized alternating direction method of multipliers (LADMM for short) to solve them. The LADMM approach is shown to be a very simple and efficient approach to numerically solve these general LASSO models. We compare it with some benchmark approaches on both synthetic and real datasets.

[1]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[2]  Xiangfeng Wang,et al.  The Linearized Alternating Direction Method of Multipliers for Dantzig Selector , 2012, SIAM J. Sci. Comput..

[3]  D. Gabay Applications of the method of multipliers to variational inequalities , 1983 .

[4]  Yonina C. Eldar,et al.  Collaborative hierarchical sparse modeling , 2010, 2010 44th Annual Conference on Information Sciences and Systems (CISS).

[5]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[6]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .

[7]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[8]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[9]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[10]  R. Tyrrell Rockafellar,et al.  Augmented Lagrangians and Applications of the Proximal Point Algorithm in Convex Programming , 1976, Math. Oper. Res..

[11]  Holger Hoefling A Path Algorithm for the Fused Lasso Signal Approximator , 2009, 0910.0526.

[12]  Emmanuel Barillot,et al.  Classification of arrayCGH data using fused SVM , 2008, ISMB.

[13]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[14]  Stanley Osher,et al.  A Unified Primal-Dual Algorithm Framework Based on Bregman Iteration , 2010, J. Sci. Comput..

[15]  Bertrand Thirion,et al.  Multi-scale Mining of fMRI Data with Hierarchical Structured Sparsity , 2011, 2011 International Workshop on Pattern Recognition in NeuroImaging.

[16]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[17]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[18]  Bingsheng He,et al.  On the O(1/n) Convergence Rate of the Douglas-Rachford Alternating Direction Method , 2012, SIAM J. Numer. Anal..

[19]  Mark W. Schmidt,et al.  Convex Structure Learning in Log-Linear Models: Beyond Pairwise Potentials , 2010, AISTATS.

[20]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[21]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[22]  Shuiwang Ji,et al.  SLEP: Sparse Learning with Efficient Projections , 2011 .

[23]  A. Rinaldo Properties and refinements of the fused lasso , 2008, 0805.0234.

[24]  Bertrand Thirion,et al.  Multiscale Mining of fMRI Data with Hierarchical Structured Sparsity , 2012, SIAM J. Imaging Sci..

[25]  R. Glowinski,et al.  Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires , 1975 .

[26]  Volkan Cevher,et al.  Model-Based Compressive Sensing , 2008, IEEE Transactions on Information Theory.

[27]  M. Hestenes Multiplier and gradient methods , 1969 .

[28]  Amr Ahmed,et al.  Recovering time-varying networks of dependencies in social and biological studies , 2009, Proceedings of the National Academy of Sciences.

[29]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[30]  Jieping Ye,et al.  An efficient algorithm for a class of fused lasso problems , 2010, KDD.

[31]  M. J. D. Powell,et al.  A method for nonlinear constraints in minimization problems , 1969 .

[32]  Jieping Ye,et al.  Moreau-Yosida Regularization for Grouped Tree Structure Learning , 2010, NIPS.

[33]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[34]  M. Drton,et al.  Exact block-wise optimization in group lasso and sparse group lasso for linear regression , 2010, 1010.3320.

[35]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[36]  Junfeng Yang,et al.  Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization , 2012, Math. Comput..

[37]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[38]  Stephen J. Wright,et al.  Sparse reconstruction by separable approximation , 2009, IEEE Trans. Signal Process..

[39]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[40]  Chris H. Q. Ding,et al.  R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization , 2006, ICML.

[41]  Michael R. Lyu,et al.  Online learning for collaborative filtering , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[42]  Yonina C. Eldar,et al.  C-HiLasso: A Collaborative Hierarchical Sparse Modeling Framework , 2010, IEEE Transactions on Signal Processing.

[43]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[44]  R. Tibshirani,et al.  Spatial smoothing and hot spot detection for CGH data using the fused lasso. , 2008, Biostatistics.

[45]  R. L. Thorndike Who belongs in the family? , 1953 .

[46]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[47]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[48]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[49]  Julien Mairal,et al.  Proximal Methods for Hierarchical Sparse Coding , 2010, J. Mach. Learn. Res..

[50]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[51]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[52]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[53]  Junzhou Huang,et al.  Learning with structured sparsity , 2009, ICML '09.

[54]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[55]  Meta M. Voelker,et al.  Variable Selection and Model Building via Likelihood Basis Pursuit , 2004 .

[56]  Dimitri P. Bertsekas,et al.  Constrained Optimization and Lagrange Multiplier Methods , 1982 .