BANANAS: Bayesian Optimization with Neural Networks for Neural Architecture Search

Neural Architecture Search (NAS) has seen an explosion of research in the past few years. A variety of methods have been proposed to perform NAS, including reinforcement learning, Bayesian optimization with a Gaussian process model, evolutionary search, and gradient descent. In this work, we design a NAS algorithm that performs Bayesian optimization using a neural network model. We develop a path-based encoding scheme to featurize the neural architectures that are used to train the neural network model. This strategy is particularly effective for encoding architectures in cell-based search spaces. After training on just 200 random neural architectures, we are able to predict the validation accuracy of a new architecture to within one percent of its true accuracy on average. This may be of independent interest beyond Bayesian neural architecture search. We test our algorithm on the NASBench dataset (Ying et al. 2019), and show that our algorithm significantly outperforms other NAS methods including evolutionary search, reinforcement learning, and AlphaX (Wang et al. 2019). Our algorithm is over 100x more efficient than random search, and 3.8x more efficient than the next-best algorithm. We also test our algorithm on the search space used in DARTS (Liu et al. 2018), and show that our algorithm is competitive with state-of-the-art NAS algorithms on this search space.

[1]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[2]  Li Fei-Fei,et al.  Progressive Neural Architecture Search , 2017, ECCV.

[3]  Ameet Talwalkar,et al.  Random Search and Reproducibility for Neural Architecture Search , 2019, UAI.

[4]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[5]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[6]  Zhiyuan Liu,et al.  Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[7]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[8]  Andreas Nürnberger,et al.  The Power of Ensembles for Active Learning in Image Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[10]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[11]  Ramesh Raskar,et al.  Accelerating Neural Architecture Search using Performance Prediction , 2017, ICLR.

[12]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[13]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[14]  Kirthevasan Kandasamy,et al.  ProBO: a Framework for Using Probabilistic Programming in Bayesian Optimization , 2019, ArXiv.

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[17]  Wei Pan,et al.  BayesNAS: A Bayesian Approach for Neural Architecture Search , 2019, ICML.

[18]  Mehryar Mohri,et al.  AdaNet: Adaptive Structural Learning of Artificial Neural Networks , 2016, ICML.

[19]  EVOLUTIONARY-NEURAL HYBRID AGENTS , 2018 .

[20]  Peter I. Frazier,et al.  A Tutorial on Bayesian Optimization , 2018, ArXiv.

[21]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[23]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[24]  Yiyang Zhao,et al.  AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search , 2019, ArXiv.

[25]  C. Archambeau,et al.  Multiple Adaptive Bayesian Linear Regression for Scalable Bayesian Optimization with Warm Start , 2017, 1712.02902.

[26]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[28]  Liang Lin,et al.  SNAS: Stochastic Neural Architecture Search , 2018, ICLR.

[29]  Jonas Mockus,et al.  On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.

[30]  Oriol Vinyals,et al.  Hierarchical Representations for Efficient Architecture Search , 2017, ICLR.

[31]  Aaron Klein,et al.  NAS-Bench-101: Towards Reproducible Neural Architecture Search , 2019, ICML.

[32]  Qingquan Song,et al.  Auto-Keras: Efficient Neural Architecture Search with Network Morphism , 2018 .

[33]  Constantine Bekas,et al.  TAPAS: Train-less Accuracy Predictor for Architecture Search , 2018, AAAI.

[34]  Aaron Klein,et al.  Bayesian Optimization with Robust Bayesian Neural Networks , 2016, NIPS.

[35]  Frank Hutter,et al.  Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves , 2015, IJCAI.

[36]  Joong-Ho Won,et al.  Ensemble of Deep Convolutional Neural Networks for Prognosis of Ischemic Stroke , 2016, BrainLes@MICCAI.

[37]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[38]  Seo-Young Noh,et al.  AmoebaNet: An SDN-enabled network service for big data science , 2018, J. Netw. Comput. Appl..

[39]  Dario Floreano,et al.  Neuroevolution: from architectures to learning , 2008, Evol. Intell..

[40]  Nan Jiang,et al.  Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.

[41]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[42]  Kirthevasan Kandasamy,et al.  Neural Architecture Search with Bayesian Optimisation and Optimal Transport , 2018, NeurIPS.

[43]  Marius Lindauer,et al.  Best Practices for Scientific Research on Neural Architecture Search , 2019, ArXiv.

[44]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[45]  Yuandong Tian,et al.  Sample-Efficient Neural Architecture Search by Learning Action Space , 2019, ArXiv.

[46]  Shifeng Zhang,et al.  DARTS+: Improved Differentiable Architecture Search with Early Stopping , 2019, ArXiv.

[47]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[48]  Aaron Klein,et al.  Learning Curve Prediction with Bayesian Neural Networks , 2016, ICLR.

[49]  Martin Jaggi,et al.  Evaluating the Search Phase of Neural Architecture Search , 2019, ICLR.

[50]  Junjie Yan,et al.  Peephole: Predicting Network Performance Before Training , 2017, ArXiv.

[51]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[52]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[53]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  D. Sculley,et al.  Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[55]  Hiroaki Kitano,et al.  Designing Neural Networks Using Genetic Algorithms with Graph Generation System , 1990, Complex Syst..

[56]  Andrey Khorlin,et al.  Transfer NAS: Knowledge Transfer between Search Spaces with Transformer Agents , 2019, ArXiv.

[57]  Andreas Zell,et al.  Prune and Replace NAS , 2019, 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA).