论文信息 - Few-shot Neural Architecture Search

Few-shot Neural Architecture Search

To improve the search efficiency for Neural Architecture Search (NAS), One-shot NAS proposes to train a single super-net to approximate the performance of proposal architectures during search via weight-sharing. While this greatly reduces the computation cost, due to approximation error, the performance prediction by a single super-net is less accurate than training each proposal architecture from scratch, leading to search inefficiency. In this work, we propose few-shot NAS that explores the choice of using multiple super-nets: each super-net is pre-trained to be in charge of a sub-region of the search space. This reduces the prediction error of each super-net. Moreover, training these super-nets can be done jointly via sequential fine-tuning. A natural choice of sub-region is to follow the splitting of search space in NAS. We empirically evaluate our approach on three different tasks in NAS-Bench-201. Extensive results have demonstrated that few-shot NAS, using only 5 super-nets, significantly improves performance of many search methods with slight increase of search time. The architectures found by DARTs and ENAS with few-shot models achieved 88.53% and 86.50% test accuracy on CIFAR-10 in NAS-Bench-201, significantly outperformed their one-shot counterparts (with 54.30% and 54.30% test accuracy). Moreover, on AUTOGAN and DARTS, few-shot NAS also outperforms previously state-of-the-art models.

[1] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[2] Thomas S. Huang,et al. Network Slimming by Slimmable Networks: Towards One-Shot Architecture Search for Channel Numbers , 2019, ArXiv.

[3] Shiyu Chang,et al. AutoGAN: Neural Architecture Search for Generative Adversarial Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4] Ramesh Raskar,et al. Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[5] Frank Hutter,et al. NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search , 2020, ICLR.

[6] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7] Quoc V. Le,et al. Understanding and Simplifying One-Shot Architecture Search , 2018, ICML.

[8] M. Kendall. A NEW MEASURE OF RANK CORRELATION , 1938 .

[9] Song Han,et al. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[10] Hao He,et al. ProbGAN: Towards Probabilistic GAN with Theoretical Guarantees , 2018, ICLR.

[11] Yonggang Hu,et al. MergeNAS: Merge Operations into One for Differentiable Architecture Search , 2020, IJCAI.

[12] Yoshua Bengio,et al. Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[13] Enhong Chen,et al. Balanced One-shot Neural Architecture Optimization , 2019, 1909.10815.

[14] Yiyang Zhao,et al. AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search , 2019, ArXiv.

[15] Aaron Klein,et al. NAS-Bench-101: Towards Reproducible Neural Architecture Search , 2019, ICML.

[16] Lihi Zelnik-Manor,et al. XNAS: Neural Architecture Search with Expert Advice , 2019, NeurIPS.

[17] Yuandong Tian,et al. FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Alok Aggarwal,et al. Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[19] Quoc V. Le,et al. AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[20] Youhei Akimoto,et al. Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search , 2019, ICML.

[21] Ameet Talwalkar,et al. Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[22] Sanjeev Arora,et al. An Exponential Learning Rate Schedule for Deep Learning , 2020, ICLR.

[23] Xiangyu Zhang,et al. Single Path One-Shot Neural Architecture Search with Uniform Sampling , 2019, ECCV.

[24] Sanjeev Arora,et al. Theoretical Analysis of Auto Rate-Tuning by Batch Normalization , 2018, ICLR.

[25] Thomas Brox,et al. Understanding and Robustifying Differentiable Architecture Search , 2020, ICLR.

[26] Yi Yang,et al. One-Shot Neural Architecture Search via Self-Evaluated Template Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27] Simon Carbonnelle,et al. On layer-level control of DNN training and its impact on generalization , 2018, ArXiv.

[28] Mathieu Salzmann,et al. How to Train Your Super-Net: An Analysis of Training Heuristics in Weight-Sharing NAS , 2020, ArXiv.

[29] Graham W. Taylor,et al. Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[30] Wei Pan,et al. BayesNAS: A Bayesian Approach for Neural Architecture Search , 2019, ICML.

[31] Xiangyu Zhang,et al. Angle-based Search Space Shrinking for Neural Architecture Search , 2020, ECCV.

[32] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.

[33] Martin Jaggi,et al. Evaluating the Search Phase of Neural Architecture Search , 2019, ICLR.

[34] Bo Zhang,et al. FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search , 2019, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[35] Bo Chen,et al. MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Ann Bies,et al. The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[37] Yingwei Li,et al. AtomNAS: Fine-Grained End-to-End Neural Architecture Search , 2020, ICLR.

[38] Bo Zhang,et al. Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search , 2020, ECCV.

[39] Xiangyu Zhang,et al. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[40] Yi Yang,et al. NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search , 2020, ICLR.

[41] Quoc V. Le,et al. Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[42] Quoc V. Le,et al. Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43] Xiaopeng Zhang,et al. PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search , 2020, ICLR.

[44] Tie-Yan Liu,et al. Neural Architecture Optimization , 2018, NeurIPS.

[45] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[46] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47] Yiming Yang,et al. DARTS: Differentiable Architecture Search , 2018, ICLR.

[48] Kevin Leyton-Brown,et al. Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[49] Seong Joon Oh,et al. CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50] Jun Wu,et al. Progressive DARTS: Bridging the Optimization Gap for NAS in the Wild , 2019, International Journal of Computer Vision.

[51] Aaron Klein,et al. BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.

[52] Jian Sun,et al. DetNAS: Backbone Search for Object Detection , 2019, NeurIPS.

[53] Taoping Liu,et al. One-Shot Neural Architecture Search via Novelty Driven Sampling , 2020, IJCAI.

[54] Yuandong Tian,et al. Neural Architecture Search Using Deep Neural Networks and Monte Carlo Tree Search , 2020, AAAI.

[55] Trung Le,et al. MGAN: Training Generative Adversarial Nets with Multiple Generators , 2018, ICLR.

[56] Yuandong Tian,et al. Sample-Efficient Neural Architecture Search by Learning Action Space , 2019, ArXiv.

[57] Niraj K. Jha,et al. ChamNet: Towards Efficient Network Design Through Platform-Aware Model Adaptation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58] Chuang Gan,et al. Once for All: Train One Network and Specialize it for Efficient Deployment , 2019, ICLR.

[59] Wei Wang,et al. Improving MMD-GAN Training with Repulsive Loss Function , 2018, ICLR.

[60] Kaiming He,et al. Designing Network Design Spaces , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61] Li Fei-Fei,et al. Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).