Neural Architecture Search as Sparse Supernet

This paper aims at enlarging the problem of Neural Architecture Search from Single-Path and Multi-Path Search to automated Mixed-Path Search. In particular, we model the new problem as a sparse supernet with a new continuous architecture representation using a mixture of sparsity constraints, i.e., Sparse Group Lasso. The sparse supernet is expected to automatically achieve sparsely-mixed paths upon a compact set of nodes. To optimize the proposed sparse supernet, we exploit a hierarchical accelerated proximal gradient algorithm within a bi-level optimization framework. Extensive experiments on CIFAR-10, CIFAR-100, Tiny ImageNet and ImageNet demonstrate that the proposed methodology is capable of searching for compact, general and powerful neural architectures.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[3]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[4]  Bo Zhang,et al.  Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search , 2020, ECCV.

[5]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Danilo Comminiello,et al.  Group sparse regularization for deep neural networks , 2016, Neurocomputing.

[7]  Jun Wang,et al.  Off-Policy Reinforcement Learning for Efficient and Effective GAN Architecture Search , 2020, ECCV.

[8]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[9]  Hisashi Kashima,et al.  Fast Sparse Group Lasso , 2019, NeurIPS.

[10]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Chinmay Hegde,et al.  One-Shot Neural Architecture Search via Compressive Sensing , 2019, ArXiv.

[12]  Xiaopeng Zhang,et al.  PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search , 2020, ICLR.

[13]  Qian Zhang,et al.  FasterSeg: Searching for Faster Real-time Semantic Segmentation , 2019, ICLR.

[14]  Tie-Yan Liu,et al.  Neural Architecture Optimization , 2018, NeurIPS.

[15]  Gaofeng Meng,et al.  RENAS: Reinforced Evolutionary Neural Architecture Search , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[17]  Shiyu Chang,et al.  AutoGAN: Neural Architecture Search for Generative Adversarial Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Yuandong Tian,et al.  FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[20]  Oriol Vinyals,et al.  Hierarchical Representations for Efficient Architecture Search , 2017, ICLR.

[21]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[22]  Gaofeng Meng,et al.  DATA: Differentiable ArchiTecture Approximation , 2019, NeurIPS.

[23]  Mathieu Salzmann,et al.  Learning the Number of Neurons in Deep Networks , 2016, NIPS.

[24]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Naiyan Wang,et al.  You Only Search Once: Single Shot Neural Architecture Search via Direct Sparse Optimization , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Li Fei-Fei,et al.  Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[28]  James Zijun Wang,et al.  Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers , 2018, ICLR.

[29]  Bernard Ghanem,et al.  SGAS: Sequential Greedy Architecture Search , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Li Fei-Fei,et al.  Progressive Neural Architecture Search , 2017, ECCV.

[31]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .

[32]  Fevzi Alimo Methods of Combining Multiple Classiiers Based on Diierent Representations for Pen-based Handwritten Digit Recognition , 1996 .

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Liang Lin,et al.  SNAS: Stochastic Neural Architecture Search , 2018, ICLR.

[35]  Rongrong Ji,et al.  Multinomial Distribution Learning for Effective Neural Architecture Search , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[37]  Luc Van Gool,et al.  Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Tao Huang,et al.  GreedyNAS: Towards Fast One-Shot NAS With Greedy Supernet , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Shifeng Zhang,et al.  DARTS+: Improved Differentiable Architecture Search with Early Stopping , 2019, ArXiv.

[40]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[41]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Naiyan Wang,et al.  Data-Driven Sparse Structure Selection for Deep Neural Networks , 2017, ECCV.

[44]  Yi Lu,et al.  MixPath: A Unified Approach for One-shot Neural Architecture Search , 2020, ArXiv.

[45]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[46]  Wei Pan,et al.  BayesNAS: A Bayesian Approach for Neural Architecture Search , 2019, ICML.

[47]  Jun Wu,et al.  Progressive DARTS: Bridging the Optimization Gap for NAS in the Wild , 2019, International Journal of Computer Vision.

[48]  Qi Tian,et al.  Progressive Differentiable Architecture Search: Bridging the Depth Gap Between Search and Evaluation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[49]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[50]  Longhui Wei,et al.  GOLD-NAS: Gradual, One-Level, Differentiable , 2020, ArXiv.

[51]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[52]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[53]  F. Bach,et al.  Optimization with Sparsity-Inducing Penalties (Foundations and Trends(R) in Machine Learning) , 2011 .