ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding

Neural architecture search (NAS) aims to produce the optimal sparse solution from a high-dimensional space spanned by all candidate connections. Current gradient-based NAS methods commonly ignore the constraint of sparsity in the search phase, but project the optimized solution onto a sparse one by post-processing. As a result, the dense super-net for search is inefficient to train and has a gap with the projected architecture for evaluation. In this paper, we formulate neural architecture search as a sparse coding problem. We perform the differentiable search on a compressed lower-dimensional space that has the same validation loss as the original sparse solution space, and recover an architecture by solving the sparse coding problem. The differentiable search and architecture recovery are optimized in an alternate manner. By doing so, our network for search at each update satisfies the sparsity constraint and is efficient to train. In order to also eliminate the depth and width gap between the network in search and the target-net in evaluation, we further propose a method to search and evaluate in one stage under the target-net settings. When training finishes, architecture variables are absorbed into network weights. Thus we get the searched architecture and optimized parameters in a single run. In experiments, our two-stage method on CIFAR-10 requires only 0.05 GPU-day for search. Our one-stage method produces state-of-the-art performances on both CIFAR-10 and ImageNet at the cost of only evaluation time.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Quoc V. Le,et al.  NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Huan Li,et al.  Optimization Algorithm Inspired Deep Neural Network Structure Design , 2018, ACML.

[6]  Dacheng Tao,et al.  Learning from Multiple Teacher Networks , 2017, KDD.

[7]  Wei Wu,et al.  Practical Block-Wise Neural Network Architecture Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Tao Huang,et al.  GreedyNAS: Towards Fast One-Shot NAS With Greedy Supernet , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[10]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[11]  Xiangyu Zhang,et al.  Single Path One-Shot Neural Architecture Search with Uniform Sampling , 2019, ECCV.

[12]  Yingwei Li,et al.  AtomNAS: Fine-Grained End-to-End Neural Architecture Search , 2020, ICLR.

[13]  Yi Yang,et al.  Searching for a Robust Neural Architecture in Four GPU Hours , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Zhanxing Zhu,et al.  Efficient Neural Architecture Search via Proximal Iterations , 2020, AAAI.

[15]  Chunxiao Liu,et al.  DSNAS: Direct Neural Architecture Search Without Parameter Retraining , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[17]  Song Han,et al.  Path-Level Network Transformation for Efficient Architecture Search , 2018, ICML.

[18]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[19]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[20]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[21]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Chinmay Hegde,et al.  One-Shot Neural Architecture Search via Compressive Sensing , 2019, ArXiv.

[23]  Kirthevasan Kandasamy,et al.  Neural Architecture Search with Bayesian Optimisation and Optimal Transport , 2018, NeurIPS.

[24]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[25]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[26]  Quoc V. Le,et al.  Large-Scale Evolution of Image Classifiers , 2017, ICML.

[27]  Wei Pan,et al.  BayesNAS: A Bayesian Approach for Neural Architecture Search , 2019, ICML.

[28]  Xiaopeng Zhang,et al.  PC-DARTS: Partial Channel Connections for Memory-Efficient Differentiable Architecture Search , 2019, ArXiv.

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yuandong Tian,et al.  FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Oriol Vinyals,et al.  Hierarchical Representations for Efficient Architecture Search , 2017, ICLR.

[32]  Qi Tian,et al.  Progressive Differentiable Architecture Search: Bridging the Depth Gap Between Search and Evaluation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Jian Sun,et al.  DetNAS: Backbone Search for Object Detection , 2019, NeurIPS.

[34]  Xia Li,et al.  Dynamical System Inspired Adaptive Time Stepping Controller for Residual Network Families , 2020, AAAI.

[35]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Quoc V. Le,et al.  EfficientDet: Scalable and Efficient Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  George Papandreou,et al.  Searching for Efficient Multi-Scale Architectures for Dense Image Prediction , 2018, NeurIPS.

[38]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[39]  Fabio Maria Carlucci,et al.  NAS evaluation is frustratingly hard , 2020, ICLR.

[40]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Liang Lin,et al.  SNAS: Stochastic Neural Architecture Search , 2018, ICLR.

[42]  Li Fei-Fei,et al.  Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[44]  Tieniu Tan,et al.  Efficient Neural Architecture Transformation Searchin Channel-Level for Object Detection , 2019, NeurIPS.

[45]  Zhouchen Lin,et al.  Convolutional Neural Networks with Alternately Updated Clique , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[47]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[48]  Elliot Meyerson,et al.  Evolving Deep Neural Networks , 2017, Artificial Intelligence in the Age of Neural Networks and Brain Computing.

[49]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[50]  Wei Liu,et al.  MTL-NAS: Task-Agnostic Neural Architecture Search Towards General-Purpose Multi-Task Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[53]  Gaofeng Meng,et al.  DATA: Differentiable ArchiTecture Approximation , 2019, NeurIPS.

[54]  Chuang Gan,et al.  Once for All: Train One Network and Specialize it for Efficient Deployment , 2019, ICLR.

[55]  Naiyan Wang,et al.  You Only Search Once: Single Shot Neural Architecture Search via Direct Sparse Optimization , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Frank Hutter,et al.  Efficient Multi-Objective Neural Architecture Search via Lamarckian Evolution , 2018, ICLR.

[57]  David Zhang,et al.  A Survey of Sparse Representation: Algorithms and Applications , 2015, IEEE Access.

[58]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[59]  Tie-Yan Liu,et al.  Neural Architecture Optimization , 2018, NeurIPS.

[60]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.