Transferable Neural Processes for Hyperparameter Optimization

Automated machine learning aims to automate the whole process of machine learning, including model configuration. In this paper, we focus on automated hyperparameter optimization (HPO) based on sequential model-based optimization (SMBO). Though conventional SMBO algorithms work well when abundant HPO trials are available, they are far from satisfactory in practical applications where a trial on a huge dataset may be so costly that an optimal hyperparameter configuration is expected to return in as few trials as possible. Observing that human experts draw on their expertise in a machine learning model by trying configurations that once performed well on other datasets, we are inspired to speed up HPO by transferring knowledge from historical HPO trials on other datasets. We propose an end-to-end and efficient HPO algorithm named as Transfer Neural Processes (TNP), which achieves transfer learning by incorporating trials on other datasets, initializing the model with well-generalized parameters, and learning an initial set of hyperparameters to evaluate. Experiments on extensive OpenML datasets and three computer vision datasets show that the proposed model can achieve state-of-the-art performance in at least one order of magnitude less trials.

[1]  Yee Whye Teh,et al.  Conditional Neural Processes , 2018, ICML.

[2]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[3]  S. Sathiya Keerthi,et al.  An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models , 2006, NIPS.

[4]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[5]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[6]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[7]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[8]  Frank Hutter,et al.  Initializing Bayesian Hyperparameter Optimization via Meta-Learning , 2015, AAAI.

[9]  Harris Drucker,et al.  Learning algorithms for classification: A comparison on handwritten digit recognition , 1995 .

[10]  Lars Schmidt-Thieme,et al.  Learning hyperparameter optimization initializations , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[11]  Yee Whye Teh,et al.  Neural Processes , 2018, ArXiv.

[12]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[13]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[14]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[15]  Yee Whye Teh,et al.  Attentive Neural Processes , 2019, ICLR.

[16]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[17]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[18]  Matthias W. Seeger,et al.  Scalable Hyperparameter Transfer Learning , 2018, NeurIPS.

[19]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[20]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[21]  Michèle Sebag,et al.  Collaborative hyperparameter tuning , 2013, ICML.

[22]  Eytan Bakshy,et al.  Scalable Meta-Learning for Bayesian Optimization , 2018, ArXiv.

[23]  Lars Schmidt-Thieme,et al.  Hyperparameter Optimization Machines , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[24]  Marius Lindauer,et al.  Warmstarting of Model-based Algorithm Configuration , 2017, AAAI.

[25]  Lars Schmidt-Thieme,et al.  Hyperparameter Optimization with Factorized Multilayer Perceptrons , 2015, ECML/PKDD.

[26]  Aaron Klein,et al.  Bayesian Optimization with Robust Bayesian Neural Networks , 2016, NIPS.

[27]  Gideon S. Mann,et al.  Efficient Transfer Learning Method for Automatic Hyperparameter Tuning , 2014, AISTATS.

[28]  Lars Schmidt-Thieme,et al.  Two-Stage Transfer Surrogate Model for Automatic Hyperparameter Optimization , 2016, ECML/PKDD.

[29]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[30]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[31]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.