Grouping Variable Selection by Weight Fused Elastic Net for Multi-Collinear Data

In this article, we consider the problem of variable selection and estimation with the strongly correlated multi-collinear data by using grouping variable selection techniques. A new grouping variable selection method, called weight-fused elastic net(WFEN), is proposed to deal with the high dimensional collinear data. The proposed model, combined two different grouping effect mechanisms induced by the elastic net and weight-fused LASSO, respectively, can be easily unified in the frame of LASSO and computed efficiently. The performance with the simulation and real data sets shows that our method is competitive with other related methods, especially when the data present high multi-collinearity.

[1]  Qing-Song Xu,et al.  Generalized PLS regression , 2001 .

[2]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[3]  Malik Beshir Malik,et al.  Applied Linear Regression , 2005, Technometrics.

[4]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[5]  S. Weisberg Applied Linear Regression: Weisberg/Applied Linear Regression 3e , 2005 .

[6]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[7]  Trevor Hastie,et al.  Regularized linear discriminant analysis and its application in microarrays. , 2007, Biostatistics.

[8]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[9]  Rasmus Bro,et al.  Exploring the phenotypic expression of a regulatory proteome-altering gene by spectroscopy and chemometrics , 2001 .

[10]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  Z. John Daye,et al.  Shrinkage and model selection with correlated variables via weighted fusion , 2009, Comput. Stat. Data Anal..

[13]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Trevor Hastie,et al.  Averaged gene expressions for regression. , 2007, Biostatistics.

[15]  Ash A. Alizadeh,et al.  'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns , 2000, Genome Biology.

[16]  El Mostafa Qannari,et al.  Principal component regression, ridge regression and ridge principal component regression in spectroscopy calibration , 1997 .

[17]  T. Næs,et al.  Principal component regression in NIR analysis: Viewpoints, background details and selection of components , 1988 .

[18]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[19]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[20]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[21]  Sunil J Rao,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2003 .

[22]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[23]  Kam D. Dahlquist,et al.  Regression Approaches for Microarray Data Analysis , 2002, J. Comput. Biol..

[24]  Trevor Hastie,et al.  Regularized Discriminant Analysis and Its Application in Microarrays , 2004 .

[25]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[26]  T P Speed,et al.  A score test for the linkage analysis of qualitative and quantitative traits based on identity by descent data from sib-pairs. , 2000, Biostatistics.

[27]  Peter Bühlmann,et al.  Finding predictive gene groups from microarray data , 2004 .

[28]  A. E. Hoerl,et al.  Ridge Regression: Applications to Nonorthogonal Problems , 1970 .

[29]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[30]  S. Wold,et al.  Wavelength interval selection in multicomponent spectral analysis by moving window partial least-squares regression with applications to mid-infrared and near-infrared spectroscopic data. , 2002, Analytical chemistry.

[31]  G. Irwin,et al.  Dynamic inferential estimation using principal components regression (PCR) , 1998 .

[32]  A. Höskuldsson PLS regression methods , 1988 .

[33]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[34]  E. Steyerberg,et al.  [Regression modeling strategies]. , 2011, Revista espanola de cardiologia.