Exclusive Feature Learning on Arbitrary Structures via \ell_{1, 2}-norm

Group LASSO is widely used to enforce the structural sparsity, which achieves the sparsity at the inter-group level. In this paper, we propose a new formulation called "exclusive group LASSO", which brings out sparsity at intra-group level in the context of feature selection. The proposed exclusive group LASSO is applicable on any feature structures, regardless of their overlapping or non-overlapping structures. We provide analysis on the properties of exclusive group LASSO, and propose an effective iteratively re-weighted algorithm to solve the corresponding optimization problem with rigorous convergence analysis. We show applications of exclusive group LASSO for uncorrelated feature selection. Extensive experiments on both synthetic and real-world datasets validate the proposed method.

[1]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[2]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[3]  Jieping Ye,et al.  Drosophila gene expression pattern annotation using sparse features and term-term interactions , 2009, KDD.

[4]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[5]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[6]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[7]  Trevor Darrell,et al.  An efficient projection for l1, ∞ regularization , 2009, ICML '09.

[8]  Francis R. Bach,et al.  Structured Sparsity and Convex Optimization , 2012, ICPRAM.

[9]  J. Steele The Cauchy–Schwarz Master Class: References , 2004 .

[10]  Chris H. Q. Ding,et al.  Robust nonnegative matrix factorization using L21-norm , 2011, CIKM '11.

[11]  J. Habbema,et al.  Selection of Variables in Discriminant Analysis by F-statistic and Error Rate , 1977 .

[12]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[13]  Jieping Ye,et al.  Efficient Methods for Overlapping Group Lasso , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Jieping Ye,et al.  Moreau-Yosida Regularization for Grouped Tree Structure Learning , 2010, NIPS.

[15]  Rong Jin,et al.  Exclusive Lasso for Multi-task Feature Selection , 2010, AISTATS.

[16]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[17]  Volker Roth,et al.  The generalized LASSO , 2004, IEEE Transactions on Neural Networks.

[18]  Chris H. Q. Ding,et al.  Efficient Algorithms for Selecting Features with Arbitrary Group Constraints via Group Lasso , 2013, 2013 IEEE 13th International Conference on Data Mining.

[19]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[20]  Tianbao Yang,et al.  An efficient primal dual prox method for non-smooth optimization , 2014, Machine Learning.

[21]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[22]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[23]  Fei Wang,et al.  From micro to macro: data driven phenotyping by densification of longitudinal electronic medical records , 2014, KDD.

[24]  Jieping Ye,et al.  Forward-Backward Greedy Algorithms for General Convex Smooth Functions over A Cardinality Constraint , 2013, ICML.

[25]  Chris Ding,et al.  Non-convex feature learning via l p, ∞ operator , 2014, AAAI 2014.

[26]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[27]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .