论文信息 - Overcoming Multi-Model Forgetting in One-Shot NAS With Diversity Maximization

Overcoming Multi-Model Forgetting in One-Shot NAS With Diversity Maximization

One-Shot Neural Architecture Search (NAS) significantly improves the computational efficiency through weight sharing. However, this approach also introduces multi-model forgetting during the supernet training (architecture search phase), where the performance of previous architectures degrade when sequentially training new architectures with partially-shared weights. To overcome such catastrophic forgetting, the state-of-the-art method assumes that the shared weights are optimal when jointly optimizing a posterior probability. However, this strict assumption is not necessarily held for One-Shot NAS in practice. In this paper, we formulate the supernet training in the One-Shot NAS as a constrained optimization problem of continual learning that the learning of current architecture should not degrade the performance of previous architectures during the supernet training. We propose a Novelty Search based Architecture Selection (\textbf{NSAS}) loss function and demonstrate that the posterior probability could be calculated without the strict assumption when maximizing the diversity of the selected constraints. A greedy novelty search method is devised to find the most representative subset to regularize the supernet training. We apply our proposed approach to two One-Shot NAS baselines, random sampling NAS (RandomNAS) and gradient-based sampling NAS (GDAS). Extensive experiments demonstrate that our method enhances the predictive ability of the supernet in One-Shot NAS and achieves remarkable performance on CIFAR-10, CIFAR-100, and PTB with efficiency.

[1] Ameet Talwalkar,et al. Random Search and Reproducibility for Neural Architecture Search , 2019, UAI.

[2] Quoc V. Le,et al. Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[3] Yiming Yang,et al. DARTS: Differentiable Architecture Search , 2018, ICLR.

[4] M. Kendall. The treatment of ties in ranking problems. , 1945, Biometrika.

[5] Martin Jaggi,et al. Overcoming Multi-Model Forgetting , 2019, ICML.

[6] George Adam,et al. Understanding Neural Architecture Search Techniques , 2019, ArXiv.

[7] Ramesh Raskar,et al. Accelerating Neural Architecture Search using Performance Prediction , 2017, ICLR.

[8] Song Han,et al. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[9] Wei Wu,et al. Improving One-Shot NAS by Suppressing the Posterior Fading , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Yi Yang,et al. Searching for a Robust Neural Architecture in Four GPU Hours , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Bo Zhang,et al. FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search , 2019, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[13] Naiyan Wang,et al. You Only Search Once: Single Shot Neural Architecture Search via Direct Sparse Optimization , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Raquel Urtasun,et al. Graph HyperNetworks for Neural Architecture Search , 2018, ICLR.

[15] Richard Socher,et al. Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting , 2019, ICML.

[16] Du Tran,et al. What Makes Training Multi-Modal Networks Hard? , 2019, ArXiv.

[17] Liang Lin,et al. SNAS: Stochastic Neural Architecture Search , 2018, ICLR.

[18] Derek Hoiem,et al. Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Byoung-Tak Zhang,et al. Overcoming Catastrophic Forgetting by Incremental Moment Matching , 2017, NIPS.

[20] Yoshua Bengio,et al. Gradient based sample selection for online continual learning , 2019, NeurIPS.

[21] Mischa Schmidt,et al. A Study of the Learning Progress in Neural Architecture Search Techniques , 2019, ArXiv.

[22] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[23] Alok Aggarwal,et al. Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[24] Junjie Yan,et al. IRLAS: Inverse Reinforcement Learning for Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Jonathan Gordon,et al. Probabilistic Neural Architecture Search , 2019, ArXiv.

[26] Tie-Yan Liu,et al. Neural Architecture Optimization , 2018, NeurIPS.

[27] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28] Gaofeng Meng,et al. RENAS: Reinforced Evolutionary Neural Architecture Search , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Quoc V. Le,et al. Understanding and Simplifying One-Shot Architecture Search , 2018, ICML.

[30] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[31] Rongrong Ji,et al. Multinomial Distribution Learning for Effective Neural Architecture Search , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32] Wei Pan,et al. BayesNAS: A Bayesian Approach for Neural Architecture Search , 2019, ICML.

[33] Stefan Wermter,et al. Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[34] Martin Jaggi,et al. Evaluating the Search Phase of Neural Architecture Search , 2019, ICLR.

[35] Lihi Zelnik-Manor,et al. XNAS: Neural Architecture Search with Expert Advice , 2019, NeurIPS.

[36] Theodore Lim,et al. SMASH: One-Shot Model Architecture Search through HyperNetworks , 2017, ICLR.

[37] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[38] Xiangyu Zhang,et al. Single Path One-Shot Neural Architecture Search with Uniform Sampling , 2019, ECCV.