Efficient computer experiment-based optimization through variable selection

A computer experiment-based optimization approach employs design of experiments and statistical modeling to represent a complex objective function that can only be evaluated pointwise by running a computer model. In large-scale applications, the number of variables is huge, and direct use of computer experiments would require an exceedingly large experimental design and, consequently, significant computational effort. If a large portion of the variables have little impact on the objective, then there is a need to eliminate these before performing the complete set of computer experiments. This is a variable selection task. The ideal variable selection method for this task should handle unknown nonlinear structure, should be computationally fast, and would be conducted after a small number of computer experiment runs, likely fewer runs (N) than the number of variables (P). Conventional variable selection techniques are based on assumed linear model forms and cannot be applied in this “large P and small N” problem. In this paper, we present a framework that adds a variable selection step prior to computer experiment-based optimization, and we consider data mining methods, using principal components analysis and multiple testing based on false discovery rate, that are appropriate for our variable selection task. An airline fleet assignment case study is used to illustrate our approach.

[1]  Diego Klabjan,et al.  Airline Crew Scheduling , 2003 .

[2]  J. Edward Jackson,et al.  A User's Guide to Principal Components. , 1991 .

[3]  M. B. Beck,et al.  Stochastic Dynamic Programming Formulation for a Wastewater Treatment Decision-Making Framework , 2004, Ann. Oper. Res..

[4]  Hanif D. Sherali,et al.  Airline fleet assignment concepts, models, and algorithms , 2006, Eur. J. Oper. Res..

[5]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Seoung Bum Kim,et al.  A convex version of multivariate adaptive regression splines , 2015, Comput. Stat. Data Anal..

[7]  Hanif D. Sherali,et al.  Two-Stage Fleet Assignment Model Considering Stochastic Passenger Demands , 2008, Oper. Res..

[8]  Victoria C. P. Chen,et al.  Flexible and Robust Implementations of Multivariate Adaptive Regression Splines Within a Wastewater Treatment Stochastic Dynamic Program , 2005 .

[9]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[10]  G. Lynch,et al.  The Control of the False Discovery Rate in Fixed Sequence Multiple Testing , 2016, 1611.03146.

[11]  G. Dunteman Principal Components Analysis , 1989 .

[12]  J. E. Jackson A User's Guide to Principal Components , 1991 .

[13]  V. Chen Application of orthogonal arrays and MARS to inventory forecasting stochastic dynamic programs , 1999 .

[14]  Usama M. Fayyad,et al.  On the Handling of Continuous-Valued Attributes in Decision Tree Generation , 1992, Machine Learning.

[15]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[16]  Ellis L. Johnson,et al.  Airline Crew Scheduling: State-of-the-Art , 2005, Ann. Oper. Res..

[17]  Seoung Bum Kim,et al.  Spatial prediction of ozone concentration profiles , 2009, Comput. Stat. Data Anal..

[18]  Jay M. Rosenberger,et al.  A multivariate adaptive regression splines cutting plane approach for solving a two-stage stochastic programming fleet assignment model , 2012, Eur. J. Oper. Res..

[19]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[20]  Panos M. Pardalos,et al.  Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..

[21]  Matthew E. Berge,et al.  Demand Driven Dispatch: A Method for Dynamic Aircraft Capacity Assignment, Models and Algorithms , 1993, Oper. Res..

[22]  Jay M. Rosenberger,et al.  A statistical computer experiments approach to airline fleet assignment , 2008 .

[23]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[24]  Jerome H. Friedman Multivariate adaptive regression splines (with discussion) , 1991 .

[25]  J. A. López del Val,et al.  Principal Components Analysis , 2018, Applied Univariate, Bivariate, and Multivariate Statistics Using Python.

[26]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[27]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Jack P. C. Kleijnen,et al.  An Overview of the Design and Analysis of Simulation Experiments for Sensitivity Analysis , 2005, Eur. J. Oper. Res..

[29]  Ellis L. Johnson,et al.  Solving for an optimal airline yield management policy via statistical learning , 2003 .

[30]  Tapio Elomaa,et al.  Fast Minimum Training Error Discretization , 2002, ICML.

[31]  Barry M. Wise,et al.  A Theoretical Basis for the use of Principal Component Models for Monitoring Multivariate Processes , 1990 .

[32]  Russell R. Barton,et al.  A review on design, modeling and applications of computer experiments , 2006 .

[33]  Antonio Alonso Ayuso,et al.  Introduction to Stochastic Programming , 2009 .

[34]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[35]  Zhou Wang,et al.  Feature selection and classification of high-resolution NMR spectra in the complex wavelet transform domain , 2008 .

[36]  Cristiano Cervellera,et al.  Optimization of a large-scale water reservoir network by stochastic dynamic programming with efficient state space discretization , 2006, Eur. J. Oper. Res..

[37]  Warren B. Powell,et al.  Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[38]  J. Friedman Multivariate adaptive regression splines , 1990 .

[39]  Seoung Bum Kim,et al.  Multiple testing in large-scale contingency tables: inferring patterns of pair-wise amino acid association in beta-sheets , 2006, Int. J. Bioinform. Res. Appl..

[40]  Russell R. Barton,et al.  Ch. 7. A review of design and modeling in computer experiments , 2003 .

[41]  Venkata Pilla,et al.  Robust Airline Fleet Assignment , 2007 .

[42]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[43]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[44]  Christine A. Shoemaker,et al.  Applying Experimental Design and Regression Splines to High-Dimensional Continuous-State Stochastic Dynamic Programming , 1999, Oper. Res..

[45]  Michael O. Rodgers,et al.  Photochemistry of ozone formation in Atlanta, GA-Models and measurements☆ , 1995 .

[46]  Victoria C. P. Chen,et al.  Mining and modeling for a metropolitan Atlanta ozone pollution decision-making framework , 2007 .

[47]  W. Chameides,et al.  The role of biogenic hydrocarbons in urban photochemical smog: Atlanta as a case study. , 1988, Science.

[48]  M. Newton Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis , 2008 .

[49]  Aihong Wen,et al.  A Decision-Making Framework for Ozone Pollution Control , 2009, Oper. Res..

[50]  Jeffrey I. McGill,et al.  Revenue Management: Research Overview and Prospects , 1999, Transp. Sci..