Efficient Sparse Modeling With Automatic Feature Grouping

For high-dimensional data, it is often desirable to group similar features together during the learning process. This can reduce the estimation variance and improve the stability of feature selection, leading to better generalization. Moreover, it can also help in understanding and interpreting data. Octagonal shrinkage and clustering algorithm for regression (OSCAR) is a recent sparse-modeling approach that uses a l1 -regularizer and a pairwise l∞-regularizer on the feature coefficients to encourage such feature grouping. However, computationally, its optimization procedure is very expensive. In this paper, we propose an efficient solver based on the accelerated gradient method. We show that its key proximal step can be solved by a highly efficient simple iterative group merging algorithm. Given d input features, this reduces the empirical time complexity from O(d2 ~ d5) for the existing solvers to just O(d). Experimental results on a number of toy and real-world datasets demonstrate that OSCAR is a competitive sparse-modeling approach, but with the added ability of automatic feature grouping.

[1]  H. Bondell,et al.  Simultaneous regression shrinkage , variable selection and clustering of predictors with OSCAR , 2006 .

[2]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[6]  Julien Mairal,et al.  Convex optimization with sparsity-inducing norms , 2011 .

[7]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[8]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[9]  Thomas Martinetz,et al.  Simple Method for High-Performance Digit Recognition Based on Sparse Coding , 2008, IEEE Transactions on Neural Networks.

[10]  Y. She Sparse regression with exact clustering , 2008 .

[11]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[12]  Stéphane Canu,et al.  $\ell_{p}-\ell_{q}$ Penalty for Sparse Linear and Sparse Multiple Kernel Multitask Learning , 2011, IEEE Transactions on Neural Networks.

[13]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[14]  Peter Bühlmann,et al.  Finding predictive gene groups from microarray data , 2004 .

[15]  Bin Yu,et al.  Simultaneous Gene Clustering and Subset Selection for Sample Classification Via MDL , 2003, Bioinform..

[16]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[17]  C. Geyer,et al.  Adaptive regularization using the entire solution surface. , 2009, Biometrika.

[18]  Jieping Ye,et al.  An accelerated gradient method for trace norm minimization , 2009, ICML '09.

[19]  Ting Hu,et al.  Online Learning with Samples Drawn from Non-identical Distributions , 2009, J. Mach. Learn. Res..

[20]  H. Bondell,et al.  Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR , 2008, Biometrics.

[21]  Leon Wenliang Zhong,et al.  Sparse Modeling with Automatic Feature Grouping , 2011 .

[22]  Licheng Jiao,et al.  Fast Sparse Approximation for Least Squares Support Vector Machine , 2007, IEEE Transactions on Neural Networks.

[23]  Trevor Hastie,et al.  Averaged gene expressions for regression. , 2007, Biostatistics.

[24]  Julien Mairal,et al.  Convex and Network Flow Optimization for Structured Sparsity , 2011, J. Mach. Learn. Res..

[25]  R. Tibshirani,et al.  Supervised harvesting of expression trees , 2001, Genome Biology.

[26]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[27]  Francis R. Bach,et al.  Structured sparsity-inducing norms through submodular functions , 2010, NIPS.

[28]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[29]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[30]  Satoru Fujishige,et al.  Submodular functions and optimization , 1991 .

[31]  Julien Mairal,et al.  Network Flow Algorithms for Structured Sparsity , 2010, NIPS.

[32]  Michael Elad,et al.  Sparse and Redundant Representations - From Theory to Applications in Signal and Image Processing , 2010 .

[33]  Xiaotong Shen,et al.  Grouping Pursuit Through a Regularization Solution Surface , 2010, Journal of the American Statistical Association.

[34]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.