Boosting variable selection algorithm for linear regression models

With respect to variable selection for linear regression models, this paper proposes a novel boosting learning method based on genetic algorithm. Its main idea is as follows: each training example is first assigned to a weight and genetic algorithm is adopted as the base learning algorithm of boosting. Then, the training set associated with a weight distribution is taken as the input of genetic algorithm to do variable selection. Subsequently, the weight distribution is updated according to the quality of the previous variable selection results. Through repeating the above steps for multiple times, the results are then fused via a weighted combination rule. The performance of the proposed method is investigated on several simulated data sets. The experimental results show that boosting can significantly improve the variable selection performance of a genetic algorithm and can accurately identify the relevant variables.

[1]  Mu Zhu,et al.  Stochastic Stepwise Ensembles for Variable Selection , 2010, 1003.5930.

[2]  Wang Da-rong Variable Selection for Linear Regression Models:A Survey , 2010 .

[3]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[4]  A. Atkinson Subset Selection in Regression , 1992 .

[5]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[6]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[7]  Lior Rokach,et al.  Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography , 2009, Comput. Stat. Data Anal..

[8]  Galit Shmueli,et al.  To Explain or To Predict? , 2010, 1101.0891.

[9]  Mu Zhu,et al.  Darwinian Evolution in Parallel Universes: A Parallel Genetic Algorithm for Variable Selection , 2006, Technometrics.

[10]  Mu Zhu,et al.  Variable selection by ensembles for the Cox model , 2011 .

[11]  Rasmus Bro,et al.  Variable selection in regression—a tutorial , 2010 .

[12]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[13]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[14]  Sijian Wang,et al.  RANDOM LASSO. , 2011, The annals of applied statistics.

[15]  Alípio Mário Jorge,et al.  Ensemble approaches for regression: A survey , 2012, CSUR.

[16]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[17]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[18]  Alan J. Miller Subset Selection in Regression , 1992 .

[19]  Harris Drucker,et al.  Improving Regressors using Boosting Techniques , 1997, ICML.

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.