NAS-Bench-x11 and the Power of Learning Curves

In the past few years, algorithms for neural architecture search (NAS) have been used to automatically find architectures that achieve state-of-the-art performance on various datasets. In 2019, there were calls for reproducible and fair comparisons within NAS research [15, 16] due to both the lack of a consistent training pipeline between papers and experiments with not enough trials to reach statistically significant conclusions. These concerns spurred the release of tabular benchmarks, such as NAS-Bench-101 [25] and NASBench-201 [4], created by fully training all non-isomorphic architectures in search spaces of size 423 000 and 6 466, respectively. These benchmarks allow researchers to easily simulate NAS experiments, making it possible to quickly run fair NAS comparisons and to run enough trials to reach statistical significance at very little computational cost [9]. Recently, to extend the benefits of tabular NAS benchmarks to larger, more realistic NAS search spaces which cannot be evaluated exhaustively, it was proposed to construct surrogate NAS benchmarks. The first such surrogate benchmark is NAS-Bench-301 [21], which was created by training 60 000 architectures from the DARTS [17] search space, and then fitting a surrogate model which can be used to estimate the performance of all 10 architectures in the DARTS search space. Since 2019, dozens of papers have used these NAS benchmarks to develop new algorithms. A downside of these benchmarks is that the main type of algorithms that can be benchmarked are single fidelity algorithms: when the NAS algorithm chooses to evaluate an architecture, the architecture is fully trained and only the final validation accuracy is outputted. This is because NAS-Bench-301 only contains the architectures‘ accuracy

[1]  Colin White,et al.  BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search , 2019, AAAI.

[2]  B. AfeArd CALCULATING THE SINGULAR VALUES AND PSEUDOINVERSE OF A MATRIX , 2022 .

[3]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[4]  Xiangning Chen,et al.  DrNAS: Dirichlet Neural Architecture Search , 2020, ICLR.

[5]  Hiroaki Kitano,et al.  Designing Neural Networks Using Genetic Algorithms with Graph Generation System , 1990, Complex Syst..

[6]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[7]  P. A. N. Bosman,et al.  Local Search is a Remarkably Strong Baseline for Neural Architecture Search , 2020, EMO.

[8]  D. A. Kenny,et al.  Correlation and Causation. , 1982 .

[9]  Hanxiao Liu,et al.  Neural Predictor for Neural Architecture Search , 2019, ECCV.

[10]  Ramesh Raskar,et al.  Accelerating Neural Architecture Search using Performance Prediction , 2017, ICLR.

[11]  Maria-Florina Balcan,et al.  Geometry-Aware Gradient Algorithms for Neural Architecture Search , 2020, ICLR.

[12]  F. Hutter,et al.  Understanding and Robustifying Differentiable Architecture Search , 2019, ICLR.

[13]  B. Gabrys,et al.  NATS-Bench: Benchmarking NAS Algorithms for Architecture Topology and Size , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Ameet Talwalkar,et al.  Random Search and Reproducibility for Neural Architecture Search , 2019, UAI.

[15]  Qi Li,et al.  Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search , 2020, NeurIPS.

[16]  Josif Grabocka,et al.  NASLib: A Modular and Flexible Neural Architecture Search Library , 2020 .

[17]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[18]  Frank Hutter,et al.  DEHB: Evolutionary Hyberband for Scalable, Robust and Efficient Hyperparameter Optimization , 2021, IJCAI.

[19]  Julien N. Siems,et al.  NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search , 2020, ICLR.

[20]  Quoc V. Le,et al.  Understanding and Simplifying One-Shot Architecture Search , 2018, ICML.

[21]  Evgeny Burnaev,et al.  NAS-Bench-NLP: Neural Architecture Search Benchmark for Natural Language Processing , 2020, IEEE Access.

[22]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[23]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[24]  Jasper Snoek,et al.  Freeze-Thaw Bayesian Optimization , 2014, ArXiv.

[25]  Yujun Li,et al.  An Asymptotically Optimal Multi-Armed Bandit Algorithm and Hyperparameter Optimization , 2020, ArXiv.

[26]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[27]  Kirthevasan Kandasamy,et al.  Multi-fidelity Bayesian Optimisation with Continuous Approximations , 2017, ICML.

[28]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[29]  Li Fei-Fei,et al.  Progressive Neural Architecture Search , 2017, ECCV.

[30]  Fabio Maria Carlucci,et al.  NAS evaluation is frustratingly hard , 2019, ICLR.

[31]  Mi Zhang,et al.  Does Unsupervised Architecture Representation Learning Help Neural Architecture Search? , 2020, NeurIPS.

[32]  Jakub M. Tomczak,et al.  Combinatorial Bayesian Optimization using the Graph Cartesian Product , 2019, NeurIPS.

[33]  Xiaowen Dong,et al.  Neural Architecture Search using Bayesian Optimisation with Weisfeiler-Lehman Kernel , 2020, ArXiv.

[34]  Louis C. Tiao,et al.  Model-based Asynchronous Hyperparameter and Neural Architecture Search , 2020 .

[35]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[36]  Kirthevasan Kandasamy,et al.  Multi-fidelity Gaussian Process Bandit Optimisation , 2016, J. Artif. Intell. Res..

[37]  Yi Yang,et al.  NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search , 2020, ICLR.

[38]  Yash Savani,et al.  Local Search is State of the Art for NAS Benchmarks , 2020, ArXiv.

[39]  Aaron Klein,et al.  NAS-Bench-101: Towards Reproducible Neural Architecture Search , 2019, ICML.

[40]  Martin Jaggi,et al.  Evaluating the Search Phase of Neural Architecture Search , 2019, ICLR.

[41]  John Langford,et al.  Efficient Forward Architecture Search , 2019, NeurIPS.

[42]  Geoffrey J. Gordon,et al.  DeepArchitect: Automatically Designing and Training Deep Architectures , 2017, ArXiv.

[43]  Yu Wang,et al.  A Surgery of the Neural Architecture Evaluators , 2020 .

[44]  Kirthevasan Kandasamy,et al.  Neural Architecture Search with Bayesian Optimisation and Optimal Transport , 2018, NeurIPS.

[45]  Marius Lindauer,et al.  Best Practices for Scientific Research on Neural Architecture Search , 2019, ArXiv.

[46]  David R. So,et al.  Carbon Emissions and Large Neural Network Training , 2021, ArXiv.

[47]  Chen Wei,et al.  NPENAS: Neural Predictor Guided Evolution for Neural Architecture Search , 2020, ArXiv.

[48]  Zhe Feng,et al.  CATE: Computation-aware Neural Architecture Encoding with Transformers , 2021, ICML.

[49]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[50]  Enhong Chen,et al.  Semi-Supervised Neural Architecture Search , 2020, NeurIPS.

[51]  J. Kwok,et al.  Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS , 2020, NeurIPS.

[52]  Michael A. Osborne,et al.  Bayesian Optimization for Iterative Learning , 2019, NeurIPS.

[53]  Gene H. Golub,et al.  Calculating the singular values and pseudo-inverse of a matrix , 2007, Milestones in Matrix Computation.

[54]  Aaron Klein,et al.  Learning Curve Prediction with Bayesian Neural Networks , 2016, ICLR.

[55]  Aaron Klein,et al.  BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.

[56]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[57]  Kevin G. Jamieson,et al.  A System for Massively Parallel Hyperparameter Tuning , 2018, MLSys.

[58]  Aaron Klein,et al.  Probabilistic Rollouts for Learning Curve Extrapolation Across Hyperparameter Settings , 2019, ArXiv.

[59]  Chen Zhang,et al.  Deeper Insights into Weight Sharing in Neural Architecture Search , 2020, ArXiv.

[60]  Kevin Leyton-Brown,et al.  Efficient Benchmarking of Hyperparameter Optimizers via Surrogates , 2015, AAAI.

[61]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[62]  Mathieu Salzmann,et al.  Landmark Regularization: Ranking Guided Super-Net Training in Neural Architecture Search , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  W. Neiswanger,et al.  A Study on Encodings for Neural Architecture Search , 2020, NeurIPS.

[64]  F. Hutter,et al.  How Powerful are Performance Predictors in Neural Architecture Search? , 2021, NeurIPS.

[65]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[66]  Longhui Wei,et al.  Weight-Sharing Neural Architecture Search: A Battle to Shrink the Optimization Gap , 2020, ACM Comput. Surv..

[67]  PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search , 2019, ICLR.

[68]  Aaron Klein,et al.  Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.

[69]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[70]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[71]  Samin Ishtiaq,et al.  NAS-Bench-ASR: Reproducible Neural Architecture Search for Speech Recognition , 2021, ICLR.

[72]  Mark van der Wilk,et al.  Revisiting the Train Loss: an Efficient Performance Estimator for Neural Architecture Search , 2020, ArXiv.

[73]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[74]  Frank Hutter,et al.  A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets , 2017, ArXiv.

[75]  Peter M. Todd,et al.  Designing Neural Networks using Genetic Algorithms , 1989, ICGA.

[76]  Frank Hutter,et al.  Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves , 2015, IJCAI.

[77]  Margret Keuper,et al.  NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search , 2020, ArXiv.

[78]  Tao Huang,et al.  GreedyNAS: Towards Fast One-Shot NAS With Greedy Supernet , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  Ian R. Lane,et al.  Speeding up Hyper-parameter Optimization by Extrapolation of Learning Curves Using Previous Builds , 2017, ECML/PKDD.

[80]  Yi Yang,et al.  Searching for a Robust Neural Architecture in Four GPU Hours , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[81]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.