HyperNOMAD

The performance of deep neural networks is highly sensitive to the choice of the hyperparameters that define the structure of the network and the learning process. When facing a new application, tuning a deep neural network is a tedious and time consuming process that is often described as a "dark art". This explains the necessity of automating the calibration of these hyperparameters. Derivative-free optimization is a field that develops methods designed to optimize time consuming functions without relying on derivatives. This work introduces the HyperNOMAD package, an extension of the NOMAD software that applies the MADS algorithm [7] to simultaneously tune the hyperparameters responsible for both the architecture and the learning process of a deep neural network (DNN), and that allows for an important flexibility in the exploration of the search space by taking advantage of categorical variables. This new approach is tested on the MNIST and CIFAR-10 data sets and achieves results comparable to the current state of the art.

[1]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[2]  Sébastien Le Digabel,et al.  Algorithm xxx : NOMAD : Nonlinear Optimization with the MADS algorithm , 2010 .

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[5]  Frank Hutter,et al.  Efficient Multi-Objective Neural Architecture Search via Lamarckian Evolution , 2018, ICLR.

[6]  Charles Audet,et al.  The Mesh Adaptive Direct Search Algorithm for Granular and Discrete Variables , 2018, SIAM J. Optim..

[7]  CHARLES AUDET,et al.  Finding Optimal Algorithmic Parameters Using Derivative-Free Optimization , 2006, SIAM J. Optim..

[8]  Katya Scheinberg,et al.  Introduction to derivative-free optimization , 2010, Math. Comput..

[9]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[10]  Steven R. Young,et al.  Optimizing deep learning hyper-parameters through an evolutionary algorithm , 2015, MLHPC@SC.

[11]  Charles Audet,et al.  Mesh Adaptive Direct Search Algorithms for Constrained Optimization , 2006, SIAM J. Optim..

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Masanori Suganuma,et al.  A genetic programming approach to designing convolutional neural network architectures , 2017, GECCO.

[14]  M. Abramson Mixed Variable Optimization of a Load-Bearing Thermal Insulation System Using a Filter Pattern Search Algorithm , 2004 .

[15]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[16]  Lars Schmidt-Thieme,et al.  Scalable Gaussian process-based transfer surrogates for hyperparameter optimization , 2017, Machine Learning.

[17]  Alceu de Souza Britto,et al.  A Novel Orthogonal Direction Mesh Adaptive Direct Search Approach for SVM Hyperparameter Tuning , 2019, ArXiv.

[18]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[19]  Charles Audet,et al.  Pattern Search Algorithms for Mixed Variable Programming , 2000, SIAM J. Optim..

[20]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[21]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[22]  Robert Hooke,et al.  `` Direct Search'' Solution of Numerical and Statistical Problems , 1961, JACM.

[23]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[24]  Tom Bosc,et al.  Learning to Learn Neural Networks , 2016, ArXiv.

[25]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[26]  J. Dennis,et al.  Mixed Variable Optimization of the Number and Composition of Heat Intercepts in a Thermal Insulation System , 2001 .

[27]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[28]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[29]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[30]  Aaron Klein,et al.  Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search , 2018, ArXiv.

[31]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[32]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[33]  M. Powell The BOBYQA algorithm for bound constrained optimization without derivatives , 2009 .

[34]  Keiron O'Shea,et al.  An Introduction to Convolutional Neural Networks , 2015, ArXiv.

[35]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[36]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[37]  Sébastien Le Digabel,et al.  A Taxonomy of Constraints in Simulation-Based Optimization , 2015, 1505.07881.

[38]  Prasanna Balaprakash,et al.  DeepHyper: Asynchronous Hyperparameter Search for Deep Neural Networks , 2018, 2018 IEEE 25th International Conference on High Performance Computing (HiPC).

[39]  J. Dennis,et al.  Filter Pattern Search Algorithms for Mixed Variable Constrained Optimization Problems , 2004 .

[40]  Katya Scheinberg,et al.  Black-Box Optimization in Machine Learning with Trust Region Based Derivative Free Algorithm , 2017, ArXiv.

[41]  Charles Audet,et al.  Mesh adaptive direct search algorithms for mixed variable optimization , 2007, Optim. Lett..

[42]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[43]  Sébastien Le Digabel NOMAD: Nonlinear Optimization with the MADS Algorithm , 2009 .

[44]  Charles Audet,et al.  Nonsmooth optimization through Mesh Adaptive Direct Search and Variable Neighborhood Search , 2006, J. Glob. Optim..

[45]  Charles Audet,et al.  Optimization of algorithms with OPAL , 2012, Math. Program. Comput..

[46]  Guang Yang,et al.  Neural networks designing neural networks: Multi-objective hyper-parameter optimization , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[47]  Charles Audet,et al.  Derivative-Free and Blackbox Optimization , 2017 .

[48]  Shirley Dex,et al.  JR 旅客販売総合システム(マルス)における運用及び管理について , 1991 .

[49]  José Ranilla,et al.  Particle swarm optimization for hyper-parameter selection in deep neural networks , 2017, GECCO.

[50]  Katya Scheinberg,et al.  Global Convergence of General Derivative-Free Trust-Region Algorithms to First- and Second-Order Critical Points , 2009, SIAM J. Optim..

[51]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[52]  D. Sculley,et al.  Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[53]  Nikolaos Ploskas,et al.  Tuning BARON using derivative-free optimization algorithms , 2019, J. Glob. Optim..

[54]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[55]  O. SIAMJ.,et al.  ON THE CONVERGENCE OF PATTERN SEARCH ALGORITHMS , 1997 .

[56]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[57]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[58]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[59]  Charles Audet,et al.  Mesh-based Nelder–Mead algorithm for inequality constrained optimization , 2017, Computational Optimization and Applications.

[60]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[61]  Frank Hutter,et al.  CMA-ES for Hyperparameter Optimization of Deep Neural Networks , 2016, ArXiv.

[62]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[63]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[64]  Philippe L. Toint,et al.  BFO, A Trainable Derivative-free Brute Force Optimizer for Nonlinear Bound-constrained Optimization and Equilibrium Computations with Continuous and Discrete Variables , 2017, ACM Trans. Math. Softw..

[65]  Achille Fokoue,et al.  An effective algorithm for hyperparameter optimization of neural networks , 2017, IBM J. Res. Dev..