Overcoming Multi-Model Forgetting in One-Shot NAS With Diversity Maximization

One-Shot Neural Architecture Search (NAS) significantly improves the computational efficiency through weight sharing. However, this approach also introduces multi-model forgetting during the supernet training (architecture search phase), where the performance of previous architectures degrade when sequentially training new architectures with partially-shared weights. To overcome such catastrophic forgetting, the state-of-the-art method assumes that the shared weights are optimal when jointly optimizing a posterior probability. However, this strict assumption is not necessarily held for One-Shot NAS in practice. In this paper, we formulate the supernet training in the One-Shot NAS as a constrained optimization problem of continual learning that the learning of current architecture should not degrade the performance of previous architectures during the supernet training. We propose a Novelty Search based Architecture Selection (\textbf{NSAS}) loss function and demonstrate that the posterior probability could be calculated without the strict assumption when maximizing the diversity of the selected constraints. A greedy novelty search method is devised to find the most representative subset to regularize the supernet training. We apply our proposed approach to two One-Shot NAS baselines, random sampling NAS (RandomNAS) and gradient-based sampling NAS (GDAS). Extensive experiments demonstrate that our method enhances the predictive ability of the supernet in One-Shot NAS and achieves remarkable performance on CIFAR-10, CIFAR-100, and PTB with efficiency.

[1]  Ameet Talwalkar,et al.  Random Search and Reproducibility for Neural Architecture Search , 2019, UAI.

[2]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[3]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[4]  M. Kendall The treatment of ties in ranking problems. , 1945, Biometrika.

[5]  Martin Jaggi,et al.  Overcoming Multi-Model Forgetting , 2019, ICML.

[6]  George Adam,et al.  Understanding Neural Architecture Search Techniques , 2019, ArXiv.

[7]  Ramesh Raskar,et al.  Accelerating Neural Architecture Search using Performance Prediction , 2017, ICLR.

[8]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[9]  Wei Wu,et al.  Improving One-Shot NAS by Suppressing the Posterior Fading , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yi Yang,et al.  Searching for a Robust Neural Architecture in Four GPU Hours , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Bo Zhang,et al.  FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search , 2019, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[13]  Naiyan Wang,et al.  You Only Search Once: Single Shot Neural Architecture Search via Direct Sparse Optimization , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Raquel Urtasun,et al.  Graph HyperNetworks for Neural Architecture Search , 2018, ICLR.

[15]  Richard Socher,et al.  Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting , 2019, ICML.

[16]  Du Tran,et al.  What Makes Training Multi-Modal Networks Hard? , 2019, ArXiv.

[17]  Liang Lin,et al.  SNAS: Stochastic Neural Architecture Search , 2018, ICLR.

[18]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Byoung-Tak Zhang,et al.  Overcoming Catastrophic Forgetting by Incremental Moment Matching , 2017, NIPS.

[20]  Yoshua Bengio,et al.  Gradient based sample selection for online continual learning , 2019, NeurIPS.

[21]  Mischa Schmidt,et al.  A Study of the Learning Progress in Neural Architecture Search Techniques , 2019, ArXiv.

[22]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[23]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[24]  Junjie Yan,et al.  IRLAS: Inverse Reinforcement Learning for Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jonathan Gordon,et al.  Probabilistic Neural Architecture Search , 2019, ArXiv.

[26]  Tie-Yan Liu,et al.  Neural Architecture Optimization , 2018, NeurIPS.

[27]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Gaofeng Meng,et al.  RENAS: Reinforced Evolutionary Neural Architecture Search , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Quoc V. Le,et al.  Understanding and Simplifying One-Shot Architecture Search , 2018, ICML.

[30]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[31]  Rongrong Ji,et al.  Multinomial Distribution Learning for Effective Neural Architecture Search , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Wei Pan,et al.  BayesNAS: A Bayesian Approach for Neural Architecture Search , 2019, ICML.

[33]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[34]  Martin Jaggi,et al.  Evaluating the Search Phase of Neural Architecture Search , 2019, ICLR.

[35]  Lihi Zelnik-Manor,et al.  XNAS: Neural Architecture Search with Expert Advice , 2019, NeurIPS.

[36]  Theodore Lim,et al.  SMASH: One-Shot Model Architecture Search through HyperNetworks , 2017, ICLR.

[37]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[38]  Xiangyu Zhang,et al.  Single Path One-Shot Neural Architecture Search with Uniform Sampling , 2019, ECCV.