Bayesian Optimization for Iterative Learning

The performance of deep (reinforcement) learning systems crucially depends on the choice of hyperparameters. Their tuning is notoriously expensive, typically requiring an iterative training process to run for numerous steps to convergence. Traditional tuning algorithms only consider the final performance of hyperparameters acquired after many expensive iterations and ignore intermediate information from earlier training steps. In this paper, we present a Bayesian optimization(BO) approach which exploits the iterative structure of learning algorithms for efficient hyperparameter tuning. We propose to learn an evaluation function compressing learning progress at any stage of the training process into a single numeric score according to both training success and stability. Our BO framework is then tradeoff the benefit of assessing a hyperparameter setting over additional training steps against their computation cost. We further increase model efficiency by selectively including scores from different training steps for any evaluated hyperparameter set. We demonstrate the efficiency of our algorithm by tuning hyperparameters for the training of deep reinforcement learning agents and convolutional neural networks. Our algorithm outperforms all existing baselines in identifying optimal hyperparameters in minimal time.

[1]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[2]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[3]  Kirthevasan Kandasamy,et al.  Multi-fidelity Bayesian Optimisation with Continuous Approximations , 2017, ICML.

[4]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[5]  Jasper Snoek,et al.  Freeze-Thaw Bayesian Optimization , 2014, ArXiv.

[6]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[7]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[10]  Cheng Li,et al.  Efficient Bayesian Optimization for Uncertainty Reduction Over Perceived Optima Locations , 2019, 2019 IEEE International Conference on Data Mining (ICDM).

[11]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[12]  Nando de Freitas,et al.  Bayesian Optimization in AlphaGo , 2018, ArXiv.

[13]  N. Sprague Parameter Selection for the Deep Q-Learning Algorithm , 2015 .

[14]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[15]  Leslie N. Smith,et al.  A disciplined approach to neural network hyper-parameters: Part 1 - learning rate, batch size, momentum, and weight decay , 2018, ArXiv.

[16]  Stephen J. Roberts,et al.  Bayesian Optimization for Dynamic Problems , 2018, 1803.03432.

[17]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[18]  Ramesh Raskar,et al.  Accelerating Neural Architecture Search using Performance Prediction , 2017, ICLR.

[19]  Kirthevasan Kandasamy,et al.  Multi-Fidelity Black-Box Optimization with Hierarchical Partitions , 2018, ICML.

[20]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[21]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[22]  Andrew Gordon Wilson,et al.  Practical Multi-fidelity Bayesian Optimization for Hyperparameter Tuning , 2019, UAI.

[23]  Michael A. Osborne,et al.  Knowing The What But Not The Where in Bayesian Optimization , 2019, ICML.

[24]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[25]  Stephen J. Roberts,et al.  Optimization, fast and slow: optimally switching between local and Bayesian optimization , 2018, ICML.

[26]  Kian Hsiang Low,et al.  Bayesian Optimization Meets Bayesian Optimal Stopping , 2019, ICML.

[27]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[28]  Ian Gibson,et al.  Accelerating Experimental Design by Incorporating Experimenter Hunches , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[29]  Diego Granziol,et al.  Fast Information-theoretic Bayesian Optimisation , 2017, ICML.

[30]  Nando de Freitas,et al.  Theoretical Analysis of Bayesian Optimisation with Unknown Gaussian Process Hyper-Parameters , 2014, ArXiv.

[31]  Aaron Klein,et al.  Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.

[32]  Aaron Klein,et al.  BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.

[33]  Nicolas Vayatis,et al.  Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration , 2013, ECML/PKDD.

[34]  Guilherme Ottoni,et al.  Constrained Bayesian Optimization with Noisy Experiments , 2017, Bayesian Analysis.

[35]  Zi Wang,et al.  Max-value Entropy Search for Efficient Bayesian Optimization , 2017, ICML.

[36]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[37]  Aaron Klein,et al.  Learning Curve Prediction with Bayesian Neural Networks , 2016, ICLR.

[38]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[39]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[40]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[41]  Yisong Yue,et al.  A General Framework for Multi-fidelity Bayesian Optimization with Gaussian Processes , 2018, AISTATS.

[42]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[43]  Pushmeet Kohli,et al.  Batched Gaussian Process Bandit Optimization via Determinantal Point Processes , 2016, NIPS.

[44]  Peter I. Frazier,et al.  The Parallel Knowledge Gradient Method for Batch Bayesian Optimization , 2016, NIPS.

[45]  Ben Taskar,et al.  k-DPPs: Fixed-Size Determinantal Point Processes , 2011, ICML.

[46]  Carl E. Rasmussen,et al.  Active Learning of Model Evidence Using Bayesian Quadrature , 2012, NIPS.

[47]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[48]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[49]  Andreas Krause,et al.  Contextual Gaussian Process Bandit Optimization , 2011, NIPS.

[50]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[51]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[52]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[53]  Frank Hutter,et al.  Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves , 2015, IJCAI.

[54]  Peter I. Frazier,et al.  A Tutorial on Bayesian Optimization , 2018, ArXiv.

[55]  Cheng Li,et al.  Regret for Expected Improvement over the Best-Observed Value and Stopping Condition , 2017, ACML.