Disturbance-immune Weight Sharing for Neural Architecture Search

Neural architecture search (NAS) has gained increasing attention in the community of architecture design. One of the key factors behind the success lies in the training efficiency created by the weight sharing (WS) technique. However, WS-based NAS methods often suffer from a performance disturbance (PD) issue. That is, the training of subsequent architectures inevitably disturbs the performance of previously trained architectures due to the partially shared weights. This leads to inaccurate performance estimation for the previous architectures, which makes it hard to learn a good search strategy. To alleviate the performance disturbance issue, we propose a new disturbance-immune update strategy for model updating. Specifically, to preserve the knowledge learned by previous architectures, we constrain the training of subsequent architectures in an orthogonal space via orthogonal gradient descent. Equipped with this strategy, we propose a novel disturbance-immune training scheme for NAS. We theoretically analyze the effectiveness of our strategy in alleviating the PD risk. Extensive experiments on CIFAR-10 and ImageNet verify the superiority of our method.

[1]  George Adam,et al.  Understanding Neural Architecture Search Techniques , 2019, ArXiv.

[2]  Enhong Chen,et al.  Understanding and Improving One-shot Neural Architecture Optimization , 2019, ArXiv.

[3]  Quoc V. Le,et al.  Understanding and Simplifying One-Shot Architecture Search , 2018, ICML.

[4]  Ameet Talwalkar,et al.  Random Search and Reproducibility for Neural Architecture Search , 2019, UAI.

[5]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[6]  Shan Yu,et al.  Continual learning of context-dependent processing in neural networks , 2018, Nature Machine Intelligence.

[7]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Min Wu,et al.  Adaptive Cost-Sensitive Online Classification , 2018, IEEE Transactions on Knowledge and Data Engineering.

[9]  Qingyao Wu,et al.  Online Adaptive Asymmetric Active Learning for Budgeted Imbalanced Data , 2018, KDD.

[10]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[11]  Li Fei-Fei,et al.  Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Qingyao Wu,et al.  Double Forward Propagation for Memorized Batch Normalization , 2018, AAAI.

[13]  Bo Zhang,et al.  FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search , 2019, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Liang Lin,et al.  SNAS: Stochastic Neural Architecture Search , 2018, ICLR.

[15]  Junmo Kim,et al.  Deep Pyramidal Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Lihi Zelnik-Manor,et al.  XNAS: Neural Architecture Search with Expert Advice , 2019, NeurIPS.

[18]  Mingkui Tan,et al.  AdaXpert: Adapting Neural Architecture for Growing Data , 2021, ICML.

[19]  Mingkui Tan,et al.  NAT: Neural Architecture Transformer for Accurate and Compact Architectures , 2019, NeurIPS.

[20]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[22]  Yifan Zhang,et al.  Cost-Sensitive Portfolio Selection via Deep Reinforcement Learning , 2020, IEEE Transactions on Knowledge and Data Engineering.

[23]  Martin Jaggi,et al.  Evaluating the Search Phase of Neural Architecture Search , 2019, ICLR.

[24]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[25]  Mehrdad Farajtabar,et al.  Orthogonal Gradient Descent for Continual Learning , 2019, AISTATS.

[26]  P. Sen Estimates of the Regression Coefficient Based on Kendall's Tau , 1968 .

[27]  Xu He,et al.  Overcoming Catastrophic Interference by Conceptors , 2017, ArXiv.

[28]  Xiangyu Zhang,et al.  DetNAS: Neural Architecture Search on Object Detection , 2019, ArXiv.

[29]  P. Alam ‘L’ , 2021, Composites Engineering: An A–Z Guide.

[30]  Quoc V. Le,et al.  NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[33]  Yifan Zhang,et al.  Collaborative Unsupervised Domain Adaptation for Medical Image Diagnosis , 2019, IEEE Transactions on Image Processing.

[34]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[35]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[37]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[38]  Huiqi Li,et al.  Overcoming Multi-Model Forgetting in One-Shot NAS With Diversity Maximization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Yong Yu,et al.  Efficient Architecture Search by Network Transformation , 2017, AAAI.

[40]  Mingkui Tan,et al.  Contrastive Neural Architecture Search with Neural Architecture Comparators , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Tie-Yan Liu,et al.  Neural Architecture Optimization , 2018, NeurIPS.

[43]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[44]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[46]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[47]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[48]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Yuandong Tian,et al.  FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Oriol Vinyals,et al.  Hierarchical Representations for Efficient Architecture Search , 2017, ICLR.

[51]  Li Fei-Fei,et al.  Progressive Neural Architecture Search , 2017, ECCV.

[52]  Mingkui Tan,et al.  Multi-marginal Wasserstein GAN , 2019, NeurIPS.

[53]  Yifan Zhang,et al.  Online Adaptive Asymmetric Active Learning With Limited Budgets , 2019, IEEE Transactions on Knowledge and Data Engineering.

[54]  Raquel Urtasun,et al.  Graph HyperNetworks for Neural Architecture Search , 2018, ICLR.

[55]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[57]  Jing Liu,et al.  Discrimination-aware Channel Pruning for Deep Neural Networks , 2018, NeurIPS.

[58]  Wei Pan,et al.  BayesNAS: A Bayesian Approach for Neural Architecture Search , 2019, ICML.

[59]  Qingyao Wu,et al.  From Whole Slide Imaging to Microscopy: Deep Microscopy Adaptation Network for Histopathology Cancer Image Classification , 2019, MICCAI.

[60]  Martin Jaggi,et al.  Overcoming Multi-Model Forgetting , 2019, ICML.

[61]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[62]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  David Rolnick,et al.  Experience Replay for Continual Learning , 2018, NeurIPS.

[64]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.