Multi-step adaptive elastic-net: reducing false positives in high-dimensional variable selection

Regression and variable selection in high-dimensional settings, especially when has been a popular research topic in statistical machine learning. In recent years, many successful methods have been developed to tackle this problem. In this paper, we propose the multi-step adaptive elastic-net (MSA-Enet), a multi-step estimation algorithm built upon adaptive elastic-net regularization. The numerical study on simulation data and real-world biological data sets have shown that the MSA-Enet method tends to significantly reduce the number of false-positive variables, while still maintain the estimation accuracy. By analysing the variables eliminated in each step, more insight could be gained about the structure of the correlated variable groups. These properties are desirable in many real-world variable selection and regression problems.

[1]  Peter Bühlmann,et al.  Controlling false positive selections in high-dimensional regression and causal inference , 2013, Statistical methods in medical research.

[2]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[3]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[4]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[5]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[6]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[7]  Lin Dai,et al.  Group Variable Selection with Oracle Property by Weight-Fused Adaptive Elastic Net Model for Strongly Correlated Data , 2014, Commun. Stat. Simul. Comput..

[8]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[9]  V. Sheffield,et al.  Regulation of gene expression in the mammalian eye and its relevance to eye disease , 2006, Proceedings of the National Academy of Sciences.

[10]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[11]  Axel Benner,et al.  Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data , 2011, BMC Bioinformatics.

[12]  Peter Bühlmann,et al.  High-Dimensional Statistics with a View Toward Applications in Biology , 2014 .

[13]  Jun S. Liu,et al.  Integrating regulatory motif discovery and genome-wide expression analysis , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Mee Young Park,et al.  L1‐regularization path algorithm for generalized linear models , 2007 .

[15]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[16]  Torsten Hothorn,et al.  Twin Boosting: improved feature selection and prediction , 2010, Stat. Comput..

[17]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[18]  Lingmin Zeng,et al.  Group variable selection via SCAD-L2 , 2014 .

[19]  Qing-Song Xu,et al.  Grouping Variable Selection by Weight Fused Elastic Net for Multi-Collinear Data , 2012, Commun. Stat. Simul. Comput..

[20]  Jian Huang,et al.  THE Mnet method for variable selection , 2016 .

[21]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .