Multi-objective variable subset selection using heterogeneous surrogate modeling and sequential design

Constructing surrogate models of high-dimensional complex black-box systems from simulation-based data requires an appropriate choice of surrogate model type, as well as identification of the most influential input parameters. As including irrelevant input parameters results in a longer surrogate model training process and potentially increases the risk of overfitting, it is important to identify a small set of relevant parameters during the adaptive modeling phase of the surrogate modeling process. A multi-objective optimization step is proposed to identify both the appropriate model type as well as a parameters subset. The obtained model can be used for evaluation intensive applications such as exploration, sensitivity analysis or optimization.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Grzegorz W. Wasilkowski,et al.  On Multivariate Integration for Stochastic Processes , 1993 .

[3]  Hugo Jair Escalante,et al.  Multi-objective model type selection , 2014, Neurocomputing.

[4]  Dick den Hertog,et al.  Maximin Latin Hypercube Designs in Two Dimensions , 2007, Oper. Res..

[5]  H. Rabitz,et al.  Random sampling-high dimensional model representation (RS-HDMR) and orthogonality of its different order component functions. , 2006, The journal of physical chemistry. A.

[6]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[7]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machines , 2002 .

[8]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[9]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[10]  Tom Dhaene,et al.  Sensitivity of night cooling performance to room/system design: Surrogate models based on CFD , 2012 .

[11]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[12]  Hitoshi Iba,et al.  Selection of the most useful subset of genes for gene expression-based classification , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[13]  Piet Demeester,et al.  ooDACE toolbox: a flexible object-oriented Kriging implementation , 2014, J. Mach. Learn. Res..

[14]  Max D. Morris,et al.  Sampling plans based on balanced incomplete block designs for evaluating the importance of computer model inputs , 2006 .

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Dianhui Wang,et al.  Extreme learning machines: a survey , 2011, Int. J. Mach. Learn. Cybern..

[17]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[18]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[19]  Dirk Gorissen,et al.  Grid-enabled adaptive surrogate modeling for computer aided engineering , 2010 .

[20]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[21]  Piet Demeester,et al.  A Surrogate Modeling and Adaptive Sampling Toolbox for Computer Based Design , 2010, J. Mach. Learn. Res..

[22]  Carl E. Rasmussen,et al.  Gaussian Processes for Machine Learning (GPML) Toolbox , 2010, J. Mach. Learn. Res..

[23]  Robert Hooke,et al.  `` Direct Search'' Solution of Numerical and Statistical Problems , 1961, JACM.

[24]  Sohail Asghar,et al.  A REVIEW OF FEATURE SELECTION TECHNIQUES IN STRUCTURE LEARNING , 2013 .

[25]  Dirk Gorissen,et al.  Pareto-Based Multi-output Metamodeling with Active Learning , 2009, EANN.

[26]  Timothy W. Simpson,et al.  Sampling Strategies for Computer Experiments: Design and Analysis , 2001 .

[27]  Tom Dhaene,et al.  Fast calculation of multiobjective probability of improvement and expected improvement criteria for Pareto optimization , 2014, J. Glob. Optim..

[28]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[29]  Filip De Turck,et al.  Evolutionary Model Type Selection for Global Surrogate Modeling , 2009, J. Mach. Learn. Res..

[30]  Dirk Gorissen,et al.  A Novel Hybrid Sequential Design Strategy for Global Surrogate Modeling of Computer Experiments , 2011, SIAM J. Sci. Comput..

[31]  Wei Du,et al.  Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines , 2003, FEBS letters.

[32]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[33]  H. Akaike A new look at the statistical model identification , 1974 .

[34]  Tom Dhaene,et al.  A Fuzzy Hybrid Sequential Design Strategy for Global Surrogate Modeling of High-Dimensional Computer Experiments , 2015, SIAM J. Sci. Comput..

[35]  T. Dhaene,et al.  Robust Parametric Macromodeling Using Multivariate Orthonormal Vector Fitting , 2008, IEEE Transactions on Microwave Theory and Techniques.

[36]  C. D. Perttunen,et al.  Lipschitzian optimization without the Lipschitz constant , 1993 .

[37]  Tom Dhaene,et al.  High dimensional Kriging metamodelling utilising gradient information , 2016 .