Bayesian optimization for conditional hyperparameter spaces

Hyperparameter optimization is now widely applied to tune the hyperparameters of learning algorithms. The hyperparameters can have structure, resulting in hyperparameters depending on conditions, or on the values of other hyperparameters. We target the problem of combined algorithm selection and hyperparameter optimization, which includes at least one conditional hyperparameter: the choice of the learning algorithm. In this work, we show that Bayesian optimization with Gaussian processes can be used for the optimization of conditional spaces with the injection of knowledge concerning conditions in the kernel. We propose and examine the behavior of two kernels, a conditional kernel which forces the similarity of two samples from different condition branches to be zero, and the Laplace kernel, based on similarities with Mondrian processes and random forests. We show the benefit of using such kernels, as well as proper imputation of inactive hyperparameters, on a benchmark of scikit-learn models.

[1]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[2]  Aaron Klein,et al.  Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.

[3]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[4]  Yee Whye Teh,et al.  Mondrian Forests: Efficient Online Random Forests , 2014, NIPS.

[5]  Nando de Freitas,et al.  On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning , 2014, AISTATS.

[6]  Katharina Eggensperger,et al.  Towards an Empirical Foundation for Assessing Bayesian Optimization of Hyperparameters , 2013 .

[7]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[8]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[9]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Tutorial , 2016, ArXiv.

[10]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[11]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[12]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[13]  Ryan P. Adams,et al.  Slice sampling covariance hyperparameters of latent Gaussian models , 2010, NIPS.

[14]  Yee Whye Teh,et al.  Mondrian Forests for Large-Scale Regression when Uncertainty Matters , 2015, AISTATS.

[15]  Yee Whye Teh,et al.  The Mondrian Kernel , 2016, UAI.