A UTO HAS : E FFICIENT H YPERPARAMETER AND A R-CHITECTURE S EARCH

Deep learning models often require extensive efforts in optimizing hyperparameters and architectures. Standard hyperparameter optimization methods are expensive because of their multi-trial nature: different configurations are tried separately to find the best. In this paper, we propose AutoHAS, an efficient framework for both hyperparameter and architecture search. AutoHAS generalizes the concept of efficient architecture search, ENAS and DARTS, to hyperparameter search and hence can jointly optimize both in a single training. A key challenge in such generalization is that ENAS and DARTS are designed to optimize discrete architecture choices, whereas hyperparameter choices are often continuous. To tackle this challenge, we discretize the continuous space into a linear combination of multiple categorical basis. Furthermore, we extend the idea of weight sharing and augment it with REINFORCE to reduce its memory cost. In order to decouple the shared network weights and controller optimization, we also propose to create temporary weights for evaluating the sampled hyperparameters and updating the controller. Experimental results show AutoHAS can improve the ImageNet accuracy by up to 0.8% for highly-optimized state-of-the-art ResNet/EfficientNet models, and up to 11% for less-optimized models. Compared to random search and Bayesian search, AutoHAS consistently achieves better accuracy with 10x less computation cost.

[1]  Ron Kohavi,et al.  Automatic Parameter Selection by Minimizing Estimated Error , 1995, ICML.

[2]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[3]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[4]  Frank Hutter,et al.  Automated configuration of algorithms for solving hard computational problems , 2009 .

[5]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[6]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[7]  Andrew Zisserman,et al.  Delving deeper into the whorl of flower segmentation , 2010, Image Vis. Comput..

[8]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[9]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[10]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[13]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[14]  Krista A. Ehinger,et al.  SUN Database: Exploring a Large Collection of Scene Categories , 2014, International Journal of Computer Vision.

[15]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[16]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[17]  Frank Hutter,et al.  Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves , 2015, IJCAI.

[18]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Fabian Pedregosa,et al.  Hyperparameter optimization with approximate gradient , 2016, ICML.

[21]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[22]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[23]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[24]  D. Sculley,et al.  Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[25]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[26]  Mark W. Schmidt,et al.  Online Learning Rate Adaptation with Hypergradient Descent , 2017, ICLR.

[27]  Aaron Klein,et al.  Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search , 2018, ArXiv.

[28]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[32]  Liang Lin,et al.  SNAS: Stochastic Neural Architecture Search , 2018, ICLR.

[33]  Yuandong Tian,et al.  FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[35]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[36]  Aaron Klein,et al.  Tabular Benchmarks for Joint Architecture and Hyperparameter Optimization , 2019, ArXiv.

[37]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Byron Boots,et al.  Truncated Back-propagation for Bilevel Optimization , 2018, AISTATS.

[39]  Yi Yang,et al.  Network Pruning via Transformable Architecture Search , 2019, NeurIPS.

[40]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[41]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[42]  Yi Yang,et al.  Searching for a Robust Neural Architecture in Four GPU Hours , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Bo Chen,et al.  Can Weight Sharing Outperform Random Architecture Search? An Investigation With TuNAS , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Chuang Gan,et al.  Once for All: Train One Network and Specialize it for Efficient Deployment , 2019, ICLR.

[45]  David Duvenaud,et al.  Optimizing Millions of Hyperparameters by Implicit Differentiation , 2019, AISTATS.

[46]  Yuandong Tian,et al.  FBNetV3: Joint Architecture-Recipe Search using Neural Acquisition Function , 2020, ArXiv.

[47]  Yuandong Tian,et al.  FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Chen Lei,et al.  Automated Machine Learning , 2021, Cognitive Intelligence and Robotics.