SI-VDNAS: Semi-Implicit Variational Dropout for Hierarchical One-shot Neural Architecture Search

Bayesian methods have improved the interpretability and stability of neural architecture search (NAS). In this paper, we propose a novel probabilistic approach, namely Semi-Implicit Variational Dropout one-shot Neural Architecture Search (SI-VDNAS), that leverages semi-implicit variational dropout to support architecture search with variable operations and edges. SI-VDNAS achieves stable training that would not be affected by the over-selection of skip-connect operation. Experimental results demonstrate that SI-VDNAS finds a convergent architecture with only 2.7 MB parameters within 0.8 GPU-days and can achieve 2.60% top-1 error rate on CIFAR-10. The convergent architecture can obtain a top-1 error rate of 16.20% and 25.6% when transferred to CIFAR-100 and ImageNet (mobile setting).

[1]  Xiangxiang Chu,et al.  Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search , 2019, ECCV.

[2]  Lingxi Xie,et al.  Stabilizing DARTS with Amended Gradient Estimation on Architectural Parameters , 2019, ArXiv.

[3]  Shifeng Zhang,et al.  DARTS+: Improved Differentiable Architecture Search with Early Stopping , 2019, ArXiv.

[4]  Lingxi Xie,et al.  PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search , 2019, ICLR.

[5]  Rongrong Ji,et al.  Multinomial Distribution Learning for Effective Neural Architecture Search , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Wei Pan,et al.  BayesNAS: A Bayesian Approach for Neural Architecture Search , 2019, ICML.

[7]  Gaofeng Meng,et al.  Differentiable Architecture Search with Ensemble Gumbel-Softmax , 2019, ArXiv.

[8]  Qi Tian,et al.  Progressive Differentiable Architecture Search: Bridging the Depth Gap Between Search and Evaluation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Lei Zhang,et al.  Variational Bayesian Dropout With a Hierarchical Prior , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Liang Lin,et al.  SNAS: Stochastic Neural Architecture Search , 2018, ICLR.

[11]  Tie-Yan Liu,et al.  Neural Architecture Optimization , 2018, NeurIPS.

[12]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[13]  Quoc V. Le,et al.  Understanding and Simplifying One-Shot Architecture Search , 2018, ICML.

[14]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[15]  Frank Hutter,et al.  Efficient Multi-Objective Neural Architecture Search via Lamarckian Evolution , 2018, ICLR.

[16]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[17]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[18]  Li Fei-Fei,et al.  Progressive Neural Architecture Search , 2017, ECCV.

[19]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[21]  Quoc V. Le,et al.  Large-Scale Evolution of Image Classifiers , 2017, ICML.

[22]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[23]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[24]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[28]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .