MPGL: An Efficient Matching Pursuit Method for Generalized LASSO

Unlike traditional LASSO enforcing sparsity on the variables, Generalized LASSO (GL) enforces sparsity on a linear transformation of the variables, gaining flexibility and success in many applications. However, many existing GL algorithms do not scale up to high-dimensional problems, and/or only work well for a specific choice of the transformation. We propose an efficient Matching Pursuit Generalized LASSO (MPGL) method, which overcomes these issues, and is guaranteed to converge to a global optimum. We formulate the GL problem as a convex quadratic constrained linear programming (QCLP) problem and tailor-make a cutting plane method. More specifically, our MPGL iteratively activates a subset of nonzero elements of the transformed variables, and solves a subproblem involving only the activated elements thus gaining significant speed-up. Moreover, MPGL is less sensitive to the choice of the trade-off hyper-parameter between data fitting and regularization, and mitigates the longstanding hyper-parameter tuning issue in many existing methods. Experiments demonstrate the superior efficiency and accuracy of the proposed method over the state-of-the-arts in both classification and image processing tasks. Introduction Learning with sparsity-inducing norms has gained much success in many applications including medical data analysis (Tibshirani and Wang 2008), image processing (Rudin, Osher, and Fatemi 1992), feature selection (Tan, Tsang, and Wang 2014) and so on. One efficient way to enforce sparsity on the variables is to use the `1-norm as LASSO (Tibshirani 1996) instead of the `0-norm. Since then, many methods have been proposed to enforce some additional constraints (Huang, Zhang, and Metaxas 2011; Kim and Xing 2010; Tibshirani et al. 2011) to improve the results. A group of methods among them is called generalized LASSO (Tibshirani et al. 2011), which promotes the sparsity of the variables after a linear transformation (Liu, Yuan, and Ye 2013) instead of the variables themselves. The choice of such a transformation represents the property of the variables to be desired, and often depends on the application. Generalized LASSO. Let x ∈ R denote the target variable and D ∈ Rl×n be a linear transformation operator. A natural The first two authors contributed equally. Copyright c © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. way to seek x with sparsity on Dx is as follows (Liu, Yuan, and Ye 2013), min x f(x) + λ‖Dx‖0, (1) where f : R → R is a loss function (sometimes known as data fitting term) depending on the application, ‖·‖0 denotes the `0-norm regularizer, and λ ≥ 0 is known as the trade-off hyper-parameter between the data fitting and the regularization. By letting A ∈ Rm×n be a designing matrix, y ∈ R be a response vector, n ∈ R be a vector of Gaussian noise, and assuming a linear regression model y = Ax + n, a typical choice of f is f(x) = 1 2‖y −Ax‖ 2 2, which will be used throughout the rest of the paper. Since problem (1) is NP-hard, a convex relaxation is widely used:

[1]  Frédo Durand,et al.  Image and depth from a conventional camera with a coded aperture , 2007, ACM Trans. Graph..

[2]  R. Tibshirani,et al.  Spatial smoothing and hot spot detection for CGH data using the fused lasso. , 2008, Biostatistics.

[3]  Yunzhang Zhu An Augmented ADMM Algorithm With Application to the Generalized Lasso Problem , 2017 .

[4]  Jieping Ye,et al.  Fused Lasso Screening Rules via the Monotonicity of Subdifferentials , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Mingkui Tan,et al.  Blind Image Deconvolution by Automatic Gradient Activation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[7]  Ivor W. Tsang,et al.  Matching Pursuit LASSO Part I: Sparse Recovery Over Big Dictionary , 2015, IEEE Transactions on Signal Processing.

[8]  T. Golub,et al.  Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. , 2003, Cancer research.

[9]  E. Petricoin,et al.  Serum proteomic patterns for detection of prostate cancer. , 2002, Journal of the National Cancer Institute.

[10]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[11]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[12]  Jieping Ye,et al.  An efficient algorithm for a class of fused lasso problems , 2010, KDD.

[13]  Wen Gao,et al.  Efficient Generalized Fused Lasso and its Application to the Diagnosis of Alzheimer's Disease , 2014, AAAI.

[14]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[15]  Holger Hoefling A Path Algorithm for the Fused Lasso Signal Approximator , 2009, 0910.0526.

[16]  Frédo Durand,et al.  Image and depth from a conventional camera with a coded aperture , 2007, SIGGRAPH 2007.

[17]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[18]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[19]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[20]  Ivor W. Tsang,et al.  Convex Matching Pursuit for Large-Scale Sparse Coding and Subset Selection , 2012, AAAI.

[21]  Karen O. Egiazarian,et al.  Image restoration by sparse 3D transform-domain collaborative filtering , 2008, Electronic Imaging.

[22]  Stephen P. Boyd,et al.  Cutting-set methods for robust convex optimization with pessimizing oracles , 2009, Optim. Methods Softw..

[23]  Jieping Ye,et al.  Guaranteed Sparse Recovery under Linear Transformation , 2013, ICML.

[24]  Junzhou Huang,et al.  Learning with structured sparsity , 2009, ICML '09.

[25]  Ivor W. Tsang,et al.  Towards ultrahigh dimensional feature selection for big data , 2012, J. Mach. Learn. Res..

[26]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[27]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[28]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[29]  Shuiwang Ji,et al.  SLEP: Sparse Learning with Efficient Projections , 2011 .

[30]  Johannes O. Royset,et al.  On Solving Large-Scale Finite Minimax Problems Using Exponential Smoothing , 2011, J. Optim. Theory Appl..

[31]  Junfeng Yang,et al.  A New Alternating Minimization Algorithm for Total Variation Image Reconstruction , 2008, SIAM J. Imaging Sci..

[32]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[33]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[34]  D. Pinkel,et al.  Regional copy number–independent deregulation of transcription in cancer , 2006, Nature Genetics.