Surrogate Benchmarks for Hyperparameter Optimization

Since hyperparameter optimization is crucial for achieving peak performance with many machine learning algorithms, an active research community has formed around this problem in the last few years. The evaluation of new hyperparameter optimization techniques against the state of the art requires a set of benchmarks. Because such evaluations can be very expensive, early experiments are often performed using synthetic test functions rather than using real-world hyperparameter optimization problems. However, there can be a wide gap between the two kinds of problems. In this work, we introduce another option: cheap-to-evaluate surrogates of real hyperparameter optimization benchmarks that share the same hyperparameter spaces and feature similar response surfaces. Specifically, we train regression models on data describing a machine learning algorithm's performance under a wide range of hyperparameter configurations, and then cheaply evaluate hyperparameter optimization methods using the model's performance predictions in lieu of the real algorithm. We evaluate the effectiveness for using a wide range of regression techniques to build these surrogate benchmarks, both in terms of how well they predict the performance of new configurations and of how much they affect the overall performance of hyperparameter optimizers. Overall, we found that surrogate benchmarks based on random forests performed best: for benchmarks with few hyperparameters they yielded almost perfect surrogates, and for benchmarks with more complex hyperparameter spaces they still yielded surrogates that were qualitatively similar to the real benchmarks they model.

[1]  M. Helmert,et al.  FD-Autotune: Domain-Specific Configuration using Fast Downward , 2011 .

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  Chris Eliasmith,et al.  Hyperopt-Sklearn: Automatic Hyperparameter Configuration for Scikit-Learn , 2014, SciPy.

[4]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[5]  Sonja Kuhnt,et al.  Design and analysis of computer experiments , 2010 .

[6]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[7]  Michèle Sebag,et al.  Collaborative hyperparameter tuning , 2013, ICML.

[8]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[9]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[10]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[11]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[12]  Alan J. Hu,et al.  Boosting Verification by Automatic Tuning of Decision Procedures , 2007 .

[13]  Piet Demeester,et al.  A Surrogate Modeling and Adaptive Sampling Toolbox for Computer Based Design , 2010, J. Mach. Learn. Res..

[14]  Kevin Leyton-Brown,et al.  Automated Configuration of Mixed Integer Programming Solvers , 2010, CPAIOR.

[15]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[16]  Katharina Eggensperger,et al.  Towards an Empirical Foundation for Assessing Bayesian Optimization of Hyperparameters , 2013 .

[17]  Kevin Leyton-Brown,et al.  Algorithm runtime prediction: Methods & evaluation , 2012, Artif. Intell..

[18]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[19]  Teresa Bernarda Ludermir,et al.  Predicting the Performance of Learning Algorithms Using Support Vector Machines as Meta-regressors , 2008, ICANN.

[20]  Kevin Leyton-Brown,et al.  Algorithm Runtime Prediction: Methods and Evaluation (Extended Abstract) , 2015, IJCAI.

[21]  Andreas Dengel,et al.  Automatic classifier selection for non-experts , 2012, Pattern Analysis and Applications.

[22]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[23]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[24]  Raymond Ros,et al.  Real-Parameter Black-Box Optimization Benchmarking 2009: Experimental Setup , 2009 .

[25]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[26]  Ricardo Vilalta,et al.  Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[27]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[28]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[29]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.