Feature Grouping Using Weighted l1 Norm for High-Dimensional Data

Building effective predictive models from high-dimensional data is an important problem in several domains such as in bioinformatics, healthcare analytics and general regression analysis. Extracting feature groups automatically from such data with several correlated features is necessary, in order to use regularizers such as the group lasso which can exploit this deciphered grouping structure to build effective prediction models. Elastic net, fused-lasso and Octagonal Shrinkage Clustering Algorithm for Regression (oscar) are some of the popular feature grouping methods proposed in the literature which recover both sparsity and feature groups from the data. However, their predictive ability is affected adversely when the regression coefficients of adjacent feature groups are similar, but not exactly equal. This happens as these methods merge such adjacent feature groups erroneously, which is also called the misfusion problem. In order to solve this problem, in this paper, we propose a weighted l1 norm-based approach which is effective at recovering feature groups, despite the proximity of the coefficients of adjacent feature groups, building extremely accurate predictive models. This convex optimization problem is solved using the fast iterative soft-thresholding algorithm (FISTA). We depict how our approach is more effective at resolving the misfusion problem on synthetic datasets compared to existing feature grouping methods such as the elastic net, fused-lasso and oscar. We also evaluate the goodness of the model on real-world breast cancer gene expression and the 20-Newsgroups datasets.

[1]  Robert D. Nowak,et al.  Ordered Weighted L1 Regularized Regression with Strongly Correlated Covariates: Theoretical Aspects , 2016, AISTATS.

[2]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[3]  Arindam Banerjee,et al.  Structured Estimation with Atomic Norms: General Bounds and Applications , 2015, NIPS.

[4]  Dacheng Tao,et al.  Feature Selection Based on Structured Sparsity: A Comprehensive Study. , 2017, IEEE transactions on neural networks and learning systems.

[5]  Weijie J. Su,et al.  SLOPE-ADAPTIVE VARIABLE SELECTION VIA CONVEX OPTIMIZATION. , 2014, The annals of applied statistics.

[6]  H. D. Brunk,et al.  Statistical inference under order restrictions : the theory and application of isotonic regression , 1973 .

[7]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[8]  J. Leeuw,et al.  Isotone Optimization in R: Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methods , 2009 .

[9]  Bhanukiran Vinzamuri,et al.  Cox Regression with Correlation Based Regularization for Electronic Health Records , 2013, 2013 IEEE 13th International Conference on Data Mining.

[10]  Joydeep Ghosh,et al.  Enhanced hierarchical classification via isotonic smoothing , 2008, WWW.

[11]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[12]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[13]  Mário A. T. Figueiredo,et al.  The Ordered Weighted ℓ1 Norm: Atomic Formulation, Dual Norm, and Projections , 2014, ArXiv.

[14]  Leon Wenliang Zhong,et al.  Efficient Sparse Modeling With Automatic Feature Grouping , 2011, IEEE Transactions on Neural Networks and Learning Systems.

[15]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[16]  Huan Liu,et al.  Feature Selection for Classification: A Review , 2014, Data Classification: Algorithms and Applications.

[17]  Francis R. Bach,et al.  Trace Lasso: a trace norm regularization for correlated designs , 2011, NIPS.

[18]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[19]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[20]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[21]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[22]  Xiaotong Shen,et al.  Simultaneous Grouping Pursuit and Feature Selection Over an Undirected Graph , 2013, Journal of the American Statistical Association.

[23]  Stephen J. Wright,et al.  Efficient Bregman Projections onto the Permutahedron and Related Polytopes , 2016, AISTATS.

[24]  Hui Xiong,et al.  Temporal Phenotyping from Longitudinal Electronic Health Records: A Graph Based Framework , 2015, KDD.

[25]  Lei Han,et al.  Discriminative Feature Grouping , 2015, AAAI.

[26]  Jieping Ye,et al.  Feature grouping and selection over an undirected graph , 2012, KDD.

[27]  E. Xing,et al.  Statistical Estimation of Correlated Genome Associations to a Quantitative Trait Network , 2009, PLoS genetics.

[28]  Xiangrong Zeng,et al.  The Ordered Weighted $\ell_1$ Norm: Atomic Formulation, Projections, and Algorithms , 2014, 1409.4271.

[29]  H. Bondell,et al.  Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR , 2008, Biometrics.

[30]  Stéphane Canu,et al.  Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming , 2009, IEEE Transactions on Signal Processing.

[31]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[32]  Jean Ponce,et al.  Sparse Modeling for Image and Vision Processing , 2014, Found. Trends Comput. Graph. Vis..

[33]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[34]  James G. Scott,et al.  Proximal Algorithms in Statistics and Machine Learning , 2015, ArXiv.

[35]  Francis R. Bach,et al.  Structured sparsity-inducing norms through submodular functions , 2010, NIPS.

[36]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[37]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .