K-shot NAS: Learnable Weight-Sharing for NAS with K-shot Supernets

In one-shot weight sharing for NAS, the weights of each operation (at each layer) are supposed to be identical for all architectures (paths) in the supernet. However, this rules out the possibility of adjusting operation weights to cater for different paths, which limits the reliability of the evaluation results. In this paper, instead of counting on a single supernet, we introduce K-shot supernets and take their weights for each operation as a dictionary. The operation weight for each path is represented as a convex combination of items in a dictionary with a simplex code. This enables a matrix approximation of the stand-alone weight matrix with a higher rank (K > 1). A simplex-net is introduced to produce architecturecustomized code for each path. As a result, all paths can adaptively learn how to share weights in the K-shot supernets and acquire corresponding weights for better evaluation. K-shot supernets and simplex-net can be iteratively trained, and we further extend the search to the channel dimension. Extensive experiments on benchmark datasets validate that K-shot NAS significantly improves the evaluation accuracy of paths and thus brings in impressive performance improvements.

[1]  Changshui Zhang,et al.  Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space , 2020, NeurIPS.

[2]  Fei Wang,et al.  BCNet: Searching for Network Width with Bilaterally Coupled Network , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Tao Huang,et al.  GreedyNAS: Towards Fast One-Shot NAS With Greedy Supernet , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Xiaopeng Zhang,et al.  PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search , 2020, ICLR.

[5]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[6]  Fei Wang,et al.  Locally Free Weight Sharing for Network Width Search , 2021, ICLR.

[7]  N. Kishore Kumar,et al.  Literature survey on low rank approximation of matrices , 2016, ArXiv.

[8]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[9]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[10]  Zhouchen Lin,et al.  Towards Improving the Consistency, Efficiency, and Flexibility of Differentiable Neural Architecture Search , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Qiang Wang,et al.  BETANAS: BalancEd TrAining and selective drop for Neural Architecture Search , 2019, ArXiv.

[12]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[13]  Volkan Cevher,et al.  Practical Sketching Algorithms for Low-Rank Matrix Approximation , 2016, SIAM J. Matrix Anal. Appl..

[14]  Yi Yang,et al.  Searching for a Robust Neural Architecture in Four GPU Hours , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Yi Yang,et al.  Network Pruning via Transformable Architecture Search , 2019, NeurIPS.

[16]  Bo Zhang,et al.  ScarletNAS: Bridging the Gap Between Scalability and Fairness in Neural Architecture Search , 2019, ArXiv.

[17]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Tao Huang,et al.  Explicitly Learning Topology for Differentiable Neural Architecture Search , 2020, ArXiv.

[19]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Kurt Keutzer,et al.  A Survey of Quantization Methods for Efficient Neural Network Inference , 2022, Low-Power Computer Vision.

[21]  Fei Wang,et al.  Prioritized Architecture Sampling with Monto-Carlo Tree Search , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Xiangyu Zhang,et al.  Angle-based Search Space Shrinking for Neural Architecture Search , 2020, ECCV.

[23]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[24]  Lu Sheng,et al.  Powering One-shot Topological NAS with Stabilized Share-parameter Proxy , 2020, ECCV.

[25]  Dacheng Tao,et al.  Learning from Multiple Teacher Networks , 2017, KDD.

[26]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Jarkko Isotalo,et al.  The Cauchy–Schwarz Inequality , 2011 .

[28]  Bo Zhang,et al.  FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search , 2019, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[30]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[31]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Fei Wang,et al.  ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding , 2020, NeurIPS.

[34]  Yuandong Tian,et al.  FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Xiangyu Zhang,et al.  Single Path One-Shot Neural Architecture Search with Uniform Sampling , 2019, ECCV.

[36]  Yi Yang,et al.  NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search , 2020, ICLR.

[37]  W. Pirie Spearman Rank Correlation Coefficient , 2006 .

[38]  Bo Zhang,et al.  DARTS-: Robustly Stepping out of Performance Collapse Without Indicators , 2020, ArXiv.

[39]  David P. Woodruff,et al.  Low rank approximation and regression in input sparsity time , 2012, STOC '13.

[40]  Vin de Silva,et al.  Tensor rank and the ill-posedness of the best low-rank approximation problem , 2006, math/0607647.

[41]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[42]  J. Schneider,et al.  Literature survey on low rank approximation of matrices , 2017 .

[43]  Jiahui Yu,et al.  AutoSlim: Towards One-Shot Architecture Search for Channel Numbers , 2019 .