MNGNAS: Distilling Adaptive Combination of Multiple Searched Networks for One-Shot Neural Architecture Search

Recently neural architecture (NAS) search has attracted great interest in academia and industry. It remains a challenging problem due to the huge search space and computational costs. Recent studies in NAS mainly focused on the usage of weight sharing to train a SuperNet once. However, the corresponding branch of each subnetwork is not guaranteed to be fully trained. It may not only incur huge computation costs but also affect the architecture ranking in the retraining procedure. We propose a multi-teacher-guided NAS, which proposes to use the adaptive ensemble and perturbation-aware knowledge distillation algorithm in the one-shot-based NAS algorithm. The optimization method aiming to find the optimal descent directions is used to obtain adaptive coefficients for the feature maps of the combined teacher model. Besides, we propose a specific knowledge distillation process for optimal architectures and perturbed ones in each searching process to learn better feature maps for later distillation procedures. Comprehensive experiments verify our approach is flexible and effective. We show improvement in precision and search efficiency in the standard recognition dataset. We also show improvement in correlation between the accuracy of the search algorithm and true accuracy by NAS benchmark datasets.

[1]  Xiaojun Chang,et al.  BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Fei Wang,et al.  Prioritized Architecture Sampling with Monto-Carlo Tree Search , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Kaisheng Ma,et al.  Self-Distillation: Towards Efficient and Compact Neural Networks , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Qi Tian,et al.  Partially-Connected Neural Architecture Search for Reduced Computational Redundancy , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Qiang Liu,et al.  AlphaNet: Improved Training of Supernets with Alpha-Divergence , 2021, ICML.

[6]  Jian Sun,et al.  Neural Architecture Search with Random Labels , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Dilin Wang,et al.  AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Chuan Zhou,et al.  One-Shot Neural Architecture Search: Maximising Diversity to Overcome Catastrophic Forgetting , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Qi Li,et al.  Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search , 2020, NeurIPS.

[10]  Junchi Yan,et al.  DARTS-: Robustly Stepping out of Performance Collapse Without Indicators , 2020, ICLR.

[11]  B. Gabrys,et al.  NATS-Bench: Benchmarking NAS Algorithms for Architecture Topology and Size , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Hongyuan Yu,et al.  Cyclic Differentiable Architecture Search , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Xudong Li,et al.  Noisy Differentiable Architecture Search , 2020, BMVC.

[14]  Lu Sheng,et al.  Powering One-shot Topological NAS with Stabilized Share-parameter Proxy , 2020, ECCV.

[15]  Kuk-Jin Yoon,et al.  Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Tao Huang,et al.  GreedyNAS: Towards Fast One-Shot NAS With Greedy Supernet , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Cho-Jui Hsieh,et al.  Stabilizing Differentiable Architecture Search via Perturbation-based Regularization , 2020, ICML.

[18]  Xiangxiang Chu,et al.  MixPath: A Unified Approach for One-shot Neural Architecture Search , 2020, ArXiv.

[19]  Sebastian Nowozin,et al.  Hydra: Preserving Ensemble Diversity for Model Distillation , 2020, ArXiv.

[20]  Yi Yang,et al.  NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search , 2020, ICLR.

[21]  Chun Chen,et al.  Online Knowledge Distillation with Diverse Peers , 2019, AAAI.

[22]  Liang Lin,et al.  Block-Wisely Supervised Neural Architecture Search With Knowledge Distillation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Xiangxiang Chu,et al.  Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search , 2019, ECCV.

[24]  Yi Yang,et al.  One-Shot Neural Architecture Search via Self-Evaluated Template Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  F. Hutter,et al.  Understanding and Robustifying Differentiable Architecture Search , 2019, ICLR.

[26]  Shifeng Zhang,et al.  DARTS+: Improved Differentiable Architecture Search with Early Stopping , 2019, ArXiv.

[27]  Chang Xu,et al.  CARS: Continuous Evolution for Efficient Neural Architecture Search , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Bo Zhang,et al.  SCARLET-NAS: Bridging the Gap between Stability and Scalability in Weight-sharing Neural Architecture Search , 2019, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[29]  Bo Zhang,et al.  FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search , 2019, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Yi Yang,et al.  Searching for a Robust Neural Architecture in Four GPU Hours , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[32]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Qi Tian,et al.  Progressive Differentiable Architecture Search: Bridging the Depth Gap Between Search and Evaluation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  A. Schwing,et al.  Knowledge Flow: Improve Upon Your Teachers , 2019, ICLR.

[35]  Yan Lu,et al.  Relational Knowledge Distillation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Kai Zhao,et al.  Res2Net: A New Multi-Scale Backbone Architecture , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Xiangyu Zhang,et al.  Single Path One-Shot Neural Architecture Search with Uniform Sampling , 2019, ECCV.

[38]  Aaron Klein,et al.  NAS-Bench-101: Towards Reproducible Neural Architecture Search , 2019, ICML.

[39]  Yuandong Tian,et al.  FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Zhiqiang Shen,et al.  MEAL: Multi-Model Ensemble via Adversarial Learning , 2018, AAAI.

[41]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[42]  Liang Lin,et al.  SNAS: Stochastic Neural Architecture Search , 2018, ICLR.

[43]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[45]  Quoc V. Le,et al.  Understanding and Simplifying One-Shot Architecture Search , 2018, ICML.

[46]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[47]  Xu Lan,et al.  Knowledge Distillation by On-the-Fly Native Ensemble , 2018, NeurIPS.

[48]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[49]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[50]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Huchuan Lu,et al.  Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Quoc V. Le,et al.  Large-Scale Evolution of Image Classifiers , 2017, ICML.

[54]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[56]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[57]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[60]  Samira Ebrahimi Kahou,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.