Bag of Baselines for Multi-objective Joint Neural Architecture Search and Hyperparameter Optimization

Neural architecture search (NAS) and hyperparameter optimization (HPO) make deep learning accessible to non-experts by automatically finding the architecture of the deep neural network to use and tuning the hyperparameters of the used training pipeline. While both NAS and HPO have been studied extensively in recent years, NAS methods typically assume fixed hyperparameters and vice versa — there exists little work on joint NAS + HPO. Furthermore, NAS has recently often been framed as a multi-objective optimization problem, in order to take, e.g., resource requirements into account. In this paper, we propose a set of methods that extend current approaches to jointly optimize neural architectures and hyperparameters with respect to multiple objectives. We hope that these methods will serve as simple baselines for future research on multi-objective joint NAS + HPO. To facilitate this, all our code is available at https://github.com/automl/multi-obj-baselines.

[1]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[2]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[3]  Kirthevasan Kandasamy,et al.  Tuning Hyperparameters without Grad Students: Scalable and Robust Bayesian Optimisation with Dragonfly , 2019, J. Mach. Learn. Res..

[4]  Eckart Zitzler,et al.  HypE: An Algorithm for Fast Hypervolume-Based Many-Objective Optimization , 2011, Evolutionary Computation.

[5]  Quoc V. Le,et al.  AutoHAS: Differentiable Hyper-parameter and Architecture Search , 2020, ArXiv.

[6]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[7]  Frank Hutter,et al.  Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves , 2015, IJCAI.

[8]  Yong Yu,et al.  Efficient Architecture Search by Network Transformation , 2017, AAAI.

[9]  Prospero C. Naval,et al.  An effective use of crowding distance in multiobjective particle swarm optimization , 2005, GECCO '05.

[10]  Frank Hutter,et al.  CMA-ES for Hyperparameter Optimization of Deep Neural Networks , 2016, ArXiv.

[11]  Frank Hutter,et al.  Initializing Bayesian Hyperparameter Optimization via Meta-Learning , 2015, AAAI.

[12]  Thomas Brox,et al.  AutoDispNet: Improving Disparity Estimation With AutoML , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Zhichao Lu,et al.  NSGA-Net: Neural Architecture Search using Multi-Objective Genetic Algorithm (Extended Abstract) , 2020, IJCAI.

[14]  Matthias Poloczek,et al.  Scalable Global Optimization via Local Bayesian Optimization , 2019, NeurIPS.

[15]  Michael T. M. Emmerich,et al.  Single- and multiobjective evolutionary optimization assisted by Gaussian random field metamodels , 2006, IEEE Transactions on Evolutionary Computation.

[16]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[17]  Nicola Beume,et al.  SMS-EMOA: Multiobjective selection based on dominated hypervolume , 2007, Eur. J. Oper. Res..

[18]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[19]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[20]  Alok Aggarwal,et al.  Aging Evolution for Image Classifier Architecture Search , 2019, AAAI 2019.

[21]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[22]  Gerd Ascheid,et al.  Automated design of error-resilient and hardware-efficient deep neural networks , 2019, Neural Computing and Applications.

[23]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[24]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[25]  Kaisa Miettinen,et al.  Nonlinear multiobjective optimization , 1998, International series in operations research and management science.

[26]  Jenq-Shiou Leu,et al.  Improving the accuracy of pruned network using knowledge distillation , 2020 .

[27]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[28]  Aaron Klein,et al.  Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search , 2018, ArXiv.

[29]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[30]  Jasper Snoek,et al.  Input Warping for Bayesian Optimization of Non-Stationary Functions , 2014, ICML.

[31]  Tianqi Chen,et al.  Net2Net: Accelerating Learning via Knowledge Transfer , 2015, ICLR.

[32]  Xavier Gastaldi,et al.  Shake-Shake regularization , 2017, ArXiv.

[33]  Gregory D. Hager,et al.  The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[34]  Quoc V. Le,et al.  Large-Scale Evolution of Image Classifiers , 2017, ICML.

[35]  Martin Wistuba,et al.  A Survey on Neural Architecture Search , 2019, ArXiv.

[36]  Wonyong Sung,et al.  Structured Pruning of Deep Convolutional Neural Networks , 2015, ACM J. Emerg. Technol. Comput. Syst..

[37]  Frank Hutter,et al.  Learning to Design RNA , 2018, ICLR.

[38]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Frank Hutter,et al.  DEHB: Evolutionary Hyberband for Scalable, Robust and Efficient Hyperparameter Optimization , 2021, IJCAI.

[40]  Juntong Xi,et al.  Knowledge from the original network: restore a better pruned network with knowledge distillation , 2021, Complex & Intelligent Systems.

[41]  Aaron Klein,et al.  Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.

[42]  Muhammad Bilal Zafar,et al.  Multi-objective Asynchronous Successive Halving , 2021, ArXiv.

[43]  Marius Lindauer,et al.  Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL , 2020, ArXiv.

[44]  Kalyanmoy Deb,et al.  Muiltiobjective Optimization Using Nondominated Sorting in Genetic Algorithms , 1994, Evolutionary Computation.

[45]  Hamza Ouarnoughi,et al.  A Comprehensive Survey on Hardware-Aware Neural Architecture Search , 2021, ArXiv.

[46]  Frank Hutter,et al.  Simple And Efficient Architecture Search for Convolutional Neural Networks , 2017, ICLR.

[47]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[48]  Aaron Klein,et al.  Bayesian Optimization with Robust Bayesian Neural Networks , 2016, NIPS.

[49]  Aaron Klein,et al.  Hyperparameter Optimization , 2017, Encyclopedia of Machine Learning and Data Mining.

[50]  Jonas Mockus,et al.  On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.

[51]  Kalyanmoy Deb,et al.  Multi-Objective Evolutionary Algorithms , 2015, Handbook of Computational Intelligence.

[52]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[53]  Cedric Archambeau,et al.  A multi-objective perspective on jointly tuning hardware and hyperparameters , 2021, ArXiv.

[54]  Kalyanmoy Deb,et al.  NSGA-NET: A Multi-Objective Genetic Algorithm for Neural Architecture Search , 2018, ArXiv.

[55]  Aaron Klein,et al.  BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.

[56]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[57]  Thomas Bäck,et al.  Multi-Objective Bayesian Global Optimization using expected hypervolume improvement gradient , 2019, Swarm Evol. Comput..

[58]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[59]  Maximilian Balandat,et al.  Differentiable Expected Hypervolume Improvement for Parallel Multi-Objective Bayesian Optimization , 2020, NeurIPS.

[60]  Ameet Talwalkar,et al.  Non-stochastic Best Arm Identification and Hyperparameter Optimization , 2015, AISTATS.

[61]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[62]  Frank Hutter,et al.  Efficient Multi-Objective Neural Architecture Search via Lamarckian Evolution , 2018, ICLR.

[63]  Quoc V. Le,et al.  Understanding and Simplifying One-Shot Architecture Search , 2018, ICML.

[64]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[65]  Ameet Talwalkar,et al.  A System for Massively Parallel Hyperparameter Tuning , 2020, MLSys.

[66]  Kirthevasan Kandasamy,et al.  Neural Architecture Search with Bayesian Optimisation and Optimal Transport , 2018, NeurIPS.

[67]  Willie Neiswanger,et al.  BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search , 2021, AAAI.

[68]  Nicola Beume,et al.  An EMO Algorithm Using the Hypervolume Measure as Selection Criterion , 2005, EMO.

[69]  Andrew Zisserman,et al.  A Visual Vocabulary for Flower Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[70]  Yi Yang,et al.  NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search , 2020, ICLR.

[71]  Aaron Klein,et al.  Towards Automatically-Tuned Neural Networks , 2016, AutoML@ICML.

[72]  Masaki Onishi,et al.  Multiobjective tree-structured parzen estimator for computationally expensive optimization problems , 2020, GECCO.