Hyperparameter Transfer Across Developer Adjustments

After developer adjustments to a machine learning (ML) algorithm, how can the results of an old hyperparameter optimization (HPO) automatically be used to speedup a new HPO? This question poses a challenging problem, as developer adjustments can change which hyperparameter settings perform well, or even the hyperparameter search space itself. While many approaches exist that leverage knowledge obtained on previous tasks, so far, knowledge from previous development steps remains entirely untapped. In this work, we remedy this situation and propose a new research framework: hyperparameter transfer across adjustments (HT-AA). To lay a solid foundation for this research framework, we provide four simple HT-AA baseline algorithms and eight benchmarks changing various aspects of ML algorithms, their hyperparameter search spaces, and the neural architectures used. The best baseline, on average and depending on the budgets for the old and new HPO, reaches a given performance 1.2--2.6x faster than a prominent HPO algorithm without transfer. As HPO is a crucial step in ML development but requires extensive computational resources, this speedup would lead to faster development cycles, lower costs, and reduced environmental impacts. To make these benefits available to ML developers off-the-shelf and to facilitate future research on HT-AA, we provide python packages for our baselines and benchmarks.

[1]  Tianqi Chen,et al.  Net2Net: Accelerating Learning via Knowledge Transfer , 2015, ICLR.

[2]  Joaquin Vanschoren,et al.  Meta-Learning: A Survey , 2018, Automated Machine Learning.

[3]  Kevin Leyton-Brown,et al.  An Efficient Approach for Assessing Hyperparameter Importance , 2014, ICML.

[4]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[5]  Eytan Bakshy,et al.  Scalable Meta-Learning for Bayesian Optimization , 2018, ArXiv.

[6]  Yi Yang,et al.  NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search , 2020, ICLR.

[7]  Hui Xiong,et al.  A Comprehensive Survey on Transfer Learning , 2021, Proceedings of the IEEE.

[8]  Aaron Klein,et al.  BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.

[9]  Svetha Venkatesh,et al.  Regret Bounds for Transfer Learning in Bayesian Optimisation , 2017, AISTATS.

[10]  Oren Etzioni,et al.  Green AI , 2019, Commun. ACM.

[11]  Tinne Tuytelaars,et al.  A Continual Learning Survey: Defying Forgetting in Classification Tasks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Taghi M. Khoshgoftaar,et al.  A survey on heterogeneous transfer learning , 2017, Journal of Big Data.

[13]  Kevin Leyton-Brown,et al.  Surrogate Benchmarks for Hyperparameter Optimization , 2014, MetaSel@ECAI.

[14]  Xavier Bouthillier,et al.  Survey of machine-learning experimental methods at NeurIPS2019 and ICLR2020 , 2020 .

[15]  Holger H. Hoos,et al.  Programming by optimization , 2012, Commun. ACM.

[16]  Aaron Klein,et al.  Hyperparameter Optimization , 2017, Encyclopedia of Machine Learning and Data Mining.

[17]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[18]  Ser-Nam Lim,et al.  A Metric Learning Reality Check , 2020, ECCV.

[19]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[20]  Stefano Soatto,et al.  Rethinking the Hyperparameters for Fine-tuning , 2020, ICLR.

[21]  David E. Goldberg,et al.  Multi-objective bayesian optimization algorithm , 2002 .

[22]  Nando de Freitas,et al.  Bayesian Optimization in AlphaGo , 2018, ArXiv.

[23]  Andrew McCallum,et al.  Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[24]  Matthias W. Seeger,et al.  Scalable Hyperparameter Transfer Learning , 2018, NeurIPS.

[25]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[26]  Leslie Pack Kaelbling,et al.  Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior , 2018, NeurIPS.

[27]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[28]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[29]  Håkan Grahn,et al.  A Case for Guided Machine Learning , 2019, CD-MAKE.

[30]  Wolfram Burgard,et al.  Most likely heteroscedastic Gaussian process regression , 2007, ICML '07.

[31]  Aaron Klein,et al.  Tabular Benchmarks for Joint Architecture and Hyperparameter Optimization , 2019, ArXiv.

[32]  Ameet Talwalkar,et al.  Hyperband: Bandit-Based Configuration Evaluation for Hyperparameter Optimization , 2016, ICLR.

[33]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Pierre Baldi,et al.  Learning Activation Functions to Improve Deep Neural Networks , 2014, ICLR.

[35]  Jascha Sohl-Dickstein,et al.  Using a thousand optimization tasks to learn hyperparameter search strategies , 2020, ArXiv.

[36]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[37]  Lars Schmidt-Thieme,et al.  Scalable Gaussian process-based transfer surrogates for hyperparameter optimization , 2017, Machine Learning.