An alternative approach to avoid overfitting for surrogate models

Surrogate models are data-driven models used to accurately mimic the complex behavior of a system. They are often used to approximate computationally expensive simulation code in order to speed up the exploration of design spaces. A crucial step in the building of surrogate models is finding a good set of hyperparameters, which determine the behavior of the model. This is especially important when dealing with sparse data, as the models are in that case more prone to overfitting. Cross-validation is often used to optimize the hyperparameters of surrogate models, however it is computationally expensive and can still lead to overfitting or other erratic model behavior. This paper introduces a new auxiliary measure for the optimization of the hyperparameters of surrogate models which, when used in conjunction with a cheap accuracy measure, is fast and effective at avoiding unexplained model behavior.

[1]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[2]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[3]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[4]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5]  A. Stroud,et al.  Numerical integration over simplexes , 1956 .

[6]  Piet Demeester,et al.  A Surrogate Modeling and Adaptive Sampling Toolbox for Computer Based Design , 2010, J. Mach. Learn. Res..

[7]  Dirk Gorissen,et al.  Sequential modeling of a low noise amplifier with neural networks and active learning , 2009, Neural Computing and Applications.

[8]  Dong-Hoon Choi,et al.  Kriging interpolation methods in geostatistics and DACE model , 2002 .

[9]  Dirk Gorissen,et al.  A novel sequential design strategy for global surrogate modeling , 2009, Proceedings of the 2009 Winter Simulation Conference (WSC).

[10]  Yunqian Ma,et al.  Comparison of Model Selection for Regression , 2003, Neural Computation.

[11]  Martin T. Hagan,et al.  Gauss-Newton approximation to Bayesian learning , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[12]  Dick den Hertog,et al.  Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via Pareto Genetic Programming , 2009, IEEE Transactions on Evolutionary Computation.

[13]  T. Simpson,et al.  Computationally Inexpensive Metamodel Assessment Strategies , 2002 .

[14]  Yao Lin,et al.  An Efficient Robust Concept Exploration Method and Sequential Exploratory Experimental Design , 2004 .

[15]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[16]  D. Parkinson,et al.  Bayesian Methods in Cosmology: Model selection and multi-model inference , 2009 .

[17]  Ø. Hjelle,et al.  Triangulations and Applications (Mathematics and Visualization) , 2006 .

[18]  R. Kil,et al.  Model Selection for Regression with Continuous Kernel Functions Using the Modulus of Continuity , 2008 .

[19]  Nicolas Chapados,et al.  Extensions to Metric-Based Model Selection , 2003, J. Mach. Learn. Res..

[20]  David P. Dobkin,et al.  The quickhull algorithm for convex hulls , 1996, TOMS.

[21]  Simon Haykin,et al.  On Different Facets of Regularization Theory , 2002, Neural Computation.

[22]  Robert A. Lordo,et al.  Learning from Data: Concepts, Theory, and Methods , 2001, Technometrics.

[23]  David R. Anderson,et al.  Bayesian Methods in Cosmology: Model selection and multi-model inference , 2009 .

[24]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[25]  L. Goddard Approximation of Functions , 1965, Nature.