Darwinian Evolution in Parallel Universes: A Parallel Genetic Algorithm for Variable Selection

The need to identify a few important variables that affect a certain outcome of interest commonly arises in various industrial engineering applications. The genetic algorithm (GA) appears to be a natural tool for solving such a problem. In this article we first demonstrate that the GA is actually not a particularly effective variable selection tool, and then propose a very simple modification. Our idea is to run a number of GAs in parallel without allowing each GA to fully converge, and to consolidate the information from all the individual GAs in the end. We call the resulting algorithm the parallel genetic algorithm (PGA). Using a number of both simulated and real examples, we show that the PGA is an interesting as well as highly competitive and easy-to-use variable selection tool.

[1]  C. L. Mallows Some comments on C_p , 1973 .

[2]  K. Dejong,et al.  An analysis of the behavior of a class of genetic adaptive systems , 1975 .

[3]  H. Chipman,et al.  A Bayesian variable-selection approach for analyzing designed experiments with complex aliasing , 1997 .

[4]  Sheldon M. Ross Introduction to probability models , 1998 .

[5]  Faming Liang,et al.  EVOLUTIONARY MONTE CARLO: APPLICATIONS TO Cp MODEL SAMPLING AND CHANGE POINT PROBLEM , 2000 .

[6]  J. R. Koehler,et al.  Modern Applied Statistics with S-Plus. , 1996 .

[7]  C. Mallows More comments on C p , 1995 .

[8]  D. Steinberg,et al.  Technometrics , 2008 .

[9]  S. Chatterjee,et al.  Genetic algorithms and their statistical applications: an introduction , 1996 .

[10]  A. Atkinson Subset Selection in Regression , 1992 .

[11]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[12]  Samuel C. Kou,et al.  Smoothers and the Cp, Generalized Maximum Likelihood, and Extended Exponential Criteria , 2002 .

[13]  Satoshi Miyata,et al.  Adaptive Free-Knot Splines , 2003 .

[14]  John J. Grefenstette,et al.  Optimization of Control Parameters for Genetic Algorithms , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[15]  E. George,et al.  APPROACHES FOR BAYESIAN VARIABLE SELECTION , 1997 .

[16]  Dimitris Fouskakis,et al.  A Case Study of Stochastic Optimization in Health Policy: Problem Formulation and Preliminary Results , 2000, J. Glob. Optim..

[17]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[18]  Ker-Chau Li,et al.  A systematic approach to the analysis of complex interaction patterns in two-level factorial designs , 1997 .

[19]  Gareth M. James,et al.  Majority vote classifiers: theory and applications , 1998 .

[20]  Colin L. Mallows,et al.  Some Comments on Cp , 2000, Technometrics.

[21]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[22]  Sheldon M. Ross,et al.  Introduction to Probability Models (4th ed.). , 1990 .

[23]  S. Q. s3idChMn,et al.  Evolutionary Monte Carlo: Applications to C_p Model Sampling and Change Point Problem , 2000 .

[24]  G. C. McDonald,et al.  Instabilities of Regression Estimates Relating Air Pollution to Mortality , 1973 .

[25]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[26]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[27]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[28]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[29]  Robert W. Wilson,et al.  Regressions by Leaps and Bounds , 2000, Technometrics.

[30]  J. Franklin,et al.  The elements of statistical learning: data mining, inference and prediction , 2005 .