Regression model selection using genetic algorithms

The selection of independent variables in a regression model is often a challenging problem. Ideally, one would like to obtain the most adequate regression model. This task can be tackled with techniques such as expert based selection, stepwise regression and stochastic search heuristics, such as genetic algorithms (GA). In this study, we investigate the performance of two GAs for regressors selection (GARS) and for regressors selection with transformation of the regressors (GARST). We compare the performance with stepwise regression for the "Fat Measurement" and the "Cholesterol Measurement" datasets and use the AIC, BIC and SIC statistical criteria to quantify the adequacy of the models. The results for GARS are superior for all statistical criteria compared to both forward and backward stepwise regression, but not always when R2 and RMSE statistics are considered. GARST turns out to be even better compared to GARS as variable transformations help to improve results further. Moreover, the type of transformations revealed the relationships between dependent and independent variables.

[1]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[2]  H. Akaike Statistical predictor identification , 1970 .

[3]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[4]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[5]  H. Akaike A Bayesian analysis of the minimum AIC procedure , 1978 .

[6]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[7]  Muni S. Srivastava,et al.  Regression Analysis: Theory, Methods, and Applications , 1991 .

[8]  Nostrand Reinhold,et al.  the utility of using the genetic algorithm approach on the problem of Davis, L. (1991), Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York. , 1991 .

[9]  Roger W. Johnson Fitting Percentage of Body Fat to Simple Body Measurements: College Women , 1996, Journal of Statistics and Data Science Education.

[10]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[11]  S. Chatterjee,et al.  Genetic algorithms and their statistical applications: an introduction , 1996 .

[12]  L. Wasserman,et al.  Computing Bayes Factors by Combining Simulation and Asymptotic Approximations , 1997 .

[13]  Irene Poli,et al.  A genetic algorithm for graphical model selection , 1998 .

[14]  J. Brian Gray,et al.  Applied Regression Including Computing and Graphics , 1999, Technometrics.

[15]  Tommaso Minerva,et al.  Building ARMA Models with Genetic Algorithms , 2001, EvoWorkshops.

[16]  R. Baragona,et al.  Genetic algorithms for the identification of additive and innovation outliers in time series , 2001 .

[17]  Sandra Paterlini,et al.  Evolutionary approaches for statistical modelling , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[18]  Francesco Masulli,et al.  Soft Computing Applications , 2003 .

[19]  Sandra Paterlini,et al.  Evolutionary Approaches for Cluster Analysis , 2003 .

[20]  Sandra Paterlini,et al.  Clustering financial time series: an application to mutual funds style analysis , 2004, Comput. Stat. Data Anal..

[21]  Francesco Battaglia,et al.  Fitting piecewise linear threshold autoregressive models by means of genetic algorithms , 2004, Comput. Stat. Data Anal..

[22]  Sandra Paterlini,et al.  Technological modelling for graphical models: an approach based on genetic algorithms , 2004, Comput. Stat. Data Anal..

[23]  Tony O’Hagan Bayes factors , 2006 .