Efficient Algorithms for Selecting Features with Arbitrary Group Constraints via Group Lasso

Feature structure information plays an important role for regression and classification tasks. We consider a more generic problem: group lasso problem, where structures over feature space can be represented as a combination of features in a group. These groups can be either overlapped or non-overlapped, which are specified in different structures, e.g., structures over a line, a tree, a graph or even a forest. We propose a new approach to solve this generic group lasso problem, where certain features are selected in a group, and an arbitrary family of subset is allowed. We employ accelerated proximal gradient method to solve this problem, where a key step is solve the associated proximal operator. We propose a fast method to compute the proximal operator, where its convergence is rigorously proved. Experimental results on different structures (e.g., group, tree, graph structures) demonstrate the efficiency and effectiveness of the proposed algorithm.

[1]  Larry A. Wasserman,et al.  SpAM: Sparse Additive Models , 2007, NIPS.

[2]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[3]  Yang Feng,et al.  Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models , 2009, Journal of the American Statistical Association.

[4]  W. Marsden I and J , 2012 .

[5]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Jieping Ye,et al.  Moreau-Yosida Regularization for Grouped Tree Structure Learning , 2010, NIPS.

[7]  Massimo Fornasier,et al.  Iteratively Re-weighted Least Squares minimization: Proof of faster than linear rate for sparse recovery , 2008, 2008 42nd Annual Conference on Information Sciences and Systems.

[8]  Eric P. Xing,et al.  A multivariate regression approach to association analysis of a quantitative trait network , 2008, Bioinform..

[9]  J. Horowitz,et al.  VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS. , 2010, Annals of statistics.

[10]  Lixin Shen,et al.  Efficient First Order Methods for Linear Composite Regularizers , 2011, ArXiv.

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[13]  Nick G. Kingsbury,et al.  Convex approaches to model wavelet sparsity patterns , 2011, 2011 18th IEEE International Conference on Image Processing.

[14]  Xi Chen,et al.  Smoothing Proximal Gradient Method for General Structured Sparse Learning , 2011, UAI.

[15]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[16]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[17]  Chris H. Q. Ding,et al.  Minimal Shrinkage for Noisy Data Recovery Using Schatten-p Norm Objective , 2013, ECML/PKDD.

[18]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[19]  Chris H. Q. Ding,et al.  Multi-label ReliefF and F-statistic feature selections for image annotation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Junzhou Huang,et al.  Learning with Forest Sparsity , 2012, ArXiv.

[21]  Junzhou Huang,et al.  Learning with structured sparsity , 2009, ICML '09.

[22]  Chris H. Q. Ding,et al.  Robust nonnegative matrix factorization using L21-norm , 2011, CIKM '11.

[23]  Chris H. Q. Ding,et al.  Nonnegative matrix factorization using a robust error function , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Julien Mairal,et al.  Proximal Methods for Sparse Hierarchical Dictionary Learning , 2010, ICML.

[25]  Julien Mairal,et al.  Network Flow Algorithms for Structured Sparsity , 2010, NIPS.

[26]  Wotao Yin,et al.  Iteratively reweighted algorithms for compressive sensing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Feiping Nie,et al.  An Iterative Locally Linear Embedding Algorithm , 2012, ICML.

[28]  R. Fergus,et al.  Learning invariant features through topographic filter maps , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Xi Chen,et al.  Group Sparse Additive Models , 2012, ICML.

[30]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[31]  Jieping Ye,et al.  Efficient Methods for Overlapping Group Lasso , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[33]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[34]  Volkan Cevher,et al.  Sparse Signal Recovery Using Markov Random Fields , 2008, NIPS.

[35]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[36]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[37]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[38]  Yudong D. He,et al.  A Gene-Expression Signature as a Predictor of Survival in Breast Cancer , 2002 .

[39]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[40]  Julien Mairal,et al.  Structured sparsity through convex optimization , 2011, ArXiv.

[41]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[42]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.