CMA-ES for Hyperparameter Optimization of Deep Neural Networks

Hyperparameters of deep neural networks are often optimized by grid search, random search or Bayesian optimization. As an alternative, we propose to use the Covariance Matrix Adaptation Evolution Strategy (CMA-ES), which is known for its state-of-the-art performance in derivative-free optimization. CMA-ES has some useful invariance properties and is friendly to parallel evaluations of solutions. We provide a toy example comparing CMA-ES and state-of-the-art Bayesian optimization algorithms for tuning the hyperparameters of a convolutional neural network for the MNIST dataset on 30 GPUs in parallel.

[1]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[2]  Petros Koumoutsakos,et al.  A Method for Handling Uncertainty in Evolutionary Optimization With an Application to Feedback Control of Combustion , 2009, IEEE Transactions on Evolutionary Computation.

[3]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[4]  Dirk P. Kroese,et al.  Kernel density estimation via diffusion , 2010, 1011.2602.

[5]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[6]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[7]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[8]  Michèle Sebag,et al.  Self-adaptive surrogate-assisted covariance matrix adaptation evolution strategy , 2012, GECCO '12.

[9]  Andreas Krause,et al.  Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization , 2012, ICML.

[10]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[11]  Kevin Leyton-Brown,et al.  Parallel Algorithm Configuration , 2012, LION.

[12]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[13]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[14]  Katharina Eggensperger,et al.  Towards an Empirical Foundation for Assessing Bayesian Optimization of Hyperparameters , 2013 .

[15]  David Ginsbourger,et al.  Fast Computation of the Multi-Points Expected Improvement with Applications in Batch Selection , 2013, LION.

[16]  Michèle Sebag,et al.  Bi-population CMA-ES agorithms with surrogate models and line searches , 2013, GECCO.

[17]  Jasper Snoek,et al.  Bayesian Optimization with Unknown Constraints , 2014, UAI.

[18]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[19]  Jonathan Le Roux,et al.  Black box optimization for automatic speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Frank Hutter,et al.  Online Batch Selection for Faster Training of Neural Networks , 2015, ArXiv.

[21]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Frank Hutter,et al.  Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves , 2015, IJCAI.