Bayesian Optimization Combined with Successive Halving for Neural Network Architecture Optimization

The choice of hyperparameters and the selection of algorithms is a crucial part in machine learning. Bayesian optimization methods and successive halving have been applied successfully to optimize hyperparameters automatically. Therefore, we propose to combine both methods by estimating the initial population of incremental evaluation, our variation of successive halving, by means of Bayesian optimization. We apply the proposed methodology to the challenging problem of optimizing neural network architectures automatically and investigate how state of the art hyperparameter optimization methods perform for this task. In our evaluation of these automatic methods, we are able to achieve human expert performance on the MNIST data set but we are not able to achieve similar good results for the CIFAR-10 data set. However, the automated methods find shallow convolutional neural networks that outperform human crafted shallow neural networks with respect to classification error and training time.

[1]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[3]  Lars Schmidt-Thieme,et al.  Hyperparameter Search Space Pruning - A New Component for Sequential Model-Based Hyperparameter Optimization , 2015, ECML/PKDD.

[4]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[5]  Gideon S. Mann,et al.  Efficient Transfer Learning Method for Automatic Hyperparameter Tuning , 2014, AISTATS.

[6]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[9]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[10]  Frank Hutter,et al.  Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves , 2015, IJCAI.

[11]  Michael A. Osborne,et al.  Raiders of the Lost Architecture: Kernels for Bayesian Optimization in Conditional Parameter Spaces , 2014, 1409.4011.

[12]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[13]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[15]  Achille Fokoue,et al.  An effective algorithm for hyperparameter optimization of neural networks , 2017, IBM J. Res. Dev..

[16]  Lars Schmidt-Thieme,et al.  Learning Data Set Similarities for Hyperparameter Optimization Initializations , 2015, MetaSel@PKDD/ECML.

[17]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[18]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[19]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[21]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[22]  Lars Schmidt-Thieme,et al.  Hyperparameter Optimization with Factorized Multilayer Perceptrons , 2015, ECML/PKDD.

[23]  Aaron Klein,et al.  Learning Curve Prediction with Bayesian Neural Networks , 2016, ICLR.

[24]  Jasper Snoek,et al.  Freeze-Thaw Bayesian Optimization , 2014, ArXiv.

[25]  Mark D. McDonnell,et al.  Enhanced image classification with a fast-learning shallow convolutional neural network , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[26]  Ameet Talwalkar,et al.  Non-stochastic Best Arm Identification and Hyperparameter Optimization , 2015, AISTATS.

[27]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[28]  Lars Schmidt-Thieme,et al.  Automatic Frankensteining: Creating Complex Ensembles Autonomously , 2017, SDM.

[29]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[30]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Lars Schmidt-Thieme,et al.  Learning hyperparameter optimization initializations , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[32]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[33]  Quoc V. Le,et al.  Large-Scale Evolution of Image Classifiers , 2017, ICML.

[34]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[35]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[36]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[37]  Frank Hutter,et al.  Initializing Bayesian Hyperparameter Optimization via Meta-Learning , 2015, AAAI.

[38]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[39]  Benjamin Graham,et al.  Fractional Max-Pooling , 2014, ArXiv.

[40]  Aaron Klein,et al.  Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.

[41]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[42]  Aaron Klein,et al.  Towards Automatically-Tuned Neural Networks , 2016, AutoML@ICML.

[43]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Michèle Sebag,et al.  Collaborative hyperparameter tuning , 2013, ICML.

[45]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[46]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.