Using oversized models to find active variables in screening experiments

Nonregular factorial designs can be used to conduct screening experiments involving many factors and their interactions, using a small number of runs. Linear model selection is challenging in this case because the design is not orthogonal, the number of potential models is huge, and the number of observations is small. A new procedure is proposed to aid model selection in such cases. A non-convergent simulated annealing algorithm is used to generate a large set of good models that are too big; common submodels within this set are then identified using visualization techniques. An automatic method of extracting the best smaller model from the oversized-model set is also proposed. The new method has good performance, and provides graphical output that can be very helpful in decision making. Although developed for industrial screening experiments, it can be applied to any suitable regression problem.

[1]  Narayanaswamy Balakrishnan,et al.  ANALYZING UNREPLICATED FACTORIAL EXPERIMENTS: A REVIEW WITH SOME NEW PROPOSALS , 1998 .

[2]  Richard Wesley Hamming,et al.  Coding and information theory (2. ed.) , 1986 .

[3]  Randy R. Sitter,et al.  Using the Folded-Over 12-Run Plackett—Burman Design to Consider Interactions , 2001, Technometrics.

[4]  Mu Zhu,et al.  Darwinian Evolution in Parallel Universes: A Parallel Genetic Algorithm for Variable Selection , 2006, Technometrics.

[5]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[6]  T. Loughin,et al.  A permutation test for effects in an unreplicated factorial design , 1997 .

[7]  H. Chipman,et al.  A Bayesian variable-selection approach for analyzing designed experiments with complex aliasing , 1997 .

[8]  C. F. Jeff Wu,et al.  Experiments: Planning, Analysis, and Parameter Design Optimization , 2000 .

[9]  J. Clausen,et al.  Branch and Bound Algorithms-Principles and Examples , 2003 .

[10]  F. Glover,et al.  In Modern Heuristic Techniques for Combinatorial Problems , 1993 .

[11]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[12]  W. D. Ray Informed Assessments: An Introduction to Information, Entropy and Statistics , 1996 .

[13]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[14]  Richard K. Burdick Linear Models in Statistics , 2001, Technometrics.

[15]  V. Vieland,et al.  Statistical Evidence: A Likelihood Paradigm , 1998 .

[16]  Hugh A. Chipman,et al.  Fast Model Search for Designed Experiments with Complex Aliasing , 1998 .

[17]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[18]  Elizabeth A. Peck,et al.  Introduction to Linear Regression Analysis , 2001 .

[19]  K. Dowsland Some experiments with simulated annealing techniques for packing problems , 1993 .

[20]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[21]  R. Lenth Quick and easy analysis of unreplicated factorials , 1989 .

[22]  S. Raghu,et al.  The Nature of Scientific Evidence: Statistical, Philosophical and Empirical Considerations , 2005 .

[23]  Jammalamadaka Introduction to Linear Regression Analysis (3rd ed.) , 2003 .

[24]  Sunil J Rao,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2003 .

[25]  Sidney Addelman,et al.  trans-Dimethanolbis(1,1,1-trifluoro-5,5-dimethylhexane-2,4-dionato)zinc(II) , 2008, Acta crystallographica. Section E, Structure reports online.

[26]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .