A stepwise regression algorithm for high-dimensional variable selection

We propose a new stepwise regression algorithm with a simple stopping rule for the identification of influential predictors and interactions among a huge number of variables in various statistical models. Like conventional stepwise regression, at each forward selection step, a variable is included in the current model if the test statistic of the enlarged model with the predictor against the current model has the minimum -value among all the candidates and is smaller than a predetermined threshold. Instead of using conventional information types of criteria, the threshold is determined by a lower percentile of the beta distribution. We conducted extensive simulation studies to evaluate the performance of the proposed algorithm for various statistical models and found it to be very competitive and robust compared with several popular high-dimensional variable selection methods.

[1]  T. Lai,et al.  A STEPWISE REGRESSION METHOD AND CONSISTENT MODEL SELECTION FOR HIGH-DIMENSIONAL SPARSE LINEAR MODELS , 2011 .

[2]  Jianqing Fan,et al.  A Selective Overview of Variable Selection in High Dimensional Feature Space. , 2009, Statistica Sinica.

[3]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[4]  Jianqing Fan,et al.  Ultrahigh Dimensional Variable Selection: beyond the linear model , 2008, 0812.3201.

[5]  Jing-Shiang Hwang,et al.  Stepwise Paring down Variation for Identifying Influential Multi-factor Interactions Related to a Continuous Response Variable , 2012 .

[6]  Herbert A. David,et al.  Order Statistics , 2011, International Encyclopedia of Statistical Science.

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  C. Ting,et al.  A genome-wide association study identifies new loci for ACE activity: potential implications for response to ACE inhibitor , 2010, The Pharmacogenomics Journal.

[9]  Thomas M. Loughin,et al.  A systematic comparison of methods for combining p , 2004, Comput. Stat. Data Anal..

[10]  Hansheng Wang Forward Regression for Ultra-High Dimensional Variable Screening , 2009 .

[11]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[12]  Herman Chernoff,et al.  Discovering influential variables: A method of partitions , 2009, 1009.5744.

[13]  Jeffrey S. Morris,et al.  Sure independence screening for ultrahigh dimensional feature space Discussion , 2008 .

[14]  Yichao Wu,et al.  Ultrahigh Dimensional Feature Selection: Beyond The Linear Model , 2009, J. Mach. Learn. Res..

[15]  Yang Feng,et al.  High-dimensional variable selection for Cox's proportional hazards model , 2010, 1002.3315.

[16]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[17]  L. Wasserman,et al.  HIGH DIMENSIONAL VARIABLE SELECTION. , 2007, Annals of statistics.

[18]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .