Differentiable Architecture Search Meets Network Pruning at Initialization: A More Reliable, Efficient, and Flexible Framework

Although Differentiable ARchiTecture Search (DARTS) has become the mainstream paradigm in Neural Architecture Search (NAS) due to its simplicity and efficiency, more recent works found that the performance of the searched architecture barely increases with the optimization proceeding in DARTS, and the final magnitudes obtained by DARTS could hardly indicate the importance of operations. The above observation reveal that the supervision signal in DARTS may be a poor or unreliable indicator for the architecture search, inspiring an interesting and promising direction: can we measure the operation importance without any training under the differentiable paradigm? We provide an affirmative answer by customizing the NAS as a network pruning at initialization problem. With leveraging recently-proposed synaptic saliency criteria in the network pruning at initialization, we seek to score the importance of candidate operations in differentiable NAS without any training, and proposed a novel framework called training free differentiable architecture search (FreeDARTS) accordingly. We show that, without any training, FreeDARTS with different proxy metrics can outperform most NAS baselines in different search spaces. More importantly, FreeDARTS is extremely memory-efficient and computational-efficient as it abandons the training in the architecture search phase, enabling FreeDARTS to perform architecture search on a more flexible space and eliminate the depth gap between architecture search and evaluation. We hope our work inspires more attempts in solving NAS from the perspective of pruning at initialization.

[1]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[2]  Xiangning Chen,et al.  DrNAS: Dirichlet Neural Architecture Search , 2020, ICLR.

[3]  Kirthevasan Kandasamy,et al.  Neural Architecture Search with Bayesian Optimisation and Optimal Transport , 2018, NeurIPS.

[4]  Margret Keuper,et al.  NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search , 2020, ArXiv.

[5]  Philip H. S. Torr,et al.  SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.

[6]  Wei Pan,et al.  BayesNAS: A Bayesian Approach for Neural Architecture Search , 2019, ICML.

[7]  Xiangyu Zhang,et al.  Angle-based Search Space Shrinking for Neural Architecture Search , 2020, ECCV.

[8]  Mi Zhang,et al.  Does Unsupervised Architecture Representation Learning Help Neural Architecture Search? , 2020, NeurIPS.

[9]  Yi Yang,et al.  One-Shot Neural Architecture Search via Self-Evaluated Template Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Nicholas D. Lane,et al.  Zero-Cost Proxies for Lightweight NAS , 2021, ICLR.

[11]  Philip H. S. Torr,et al.  A Signal Propagation Perspective for Pruning Neural Networks at Initialization , 2019, ICLR.

[12]  Daniel L. K. Yamins,et al.  Pruning neural networks without any data by iteratively conserving synaptic flow , 2020, NeurIPS.

[13]  Frank Hutter,et al.  NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search , 2020, ICLR.

[14]  Janowsky,et al.  Pruning versus clipping in neural networks. , 1989, Physical review. A, General physics.

[15]  Thomas Brox,et al.  Understanding and Robustifying Differentiable Architecture Search , 2020, ICLR.

[16]  Lukasz Dudziak,et al.  Zero-Cost Proxies Meet Differentiable Architecture Search , 2021, ArXiv.

[17]  Xinyu Gong,et al.  Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective , 2021, ICLR.

[18]  Max Welling,et al.  Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[19]  Amos Storkey,et al.  Neural Architecture Search without Training , 2021, ICML.

[20]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[21]  Aaron Klein,et al.  NAS-Bench-101: Towards Reproducible Neural Architecture Search , 2019, ICML.

[22]  Yi Yang,et al.  NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search , 2020, ICLR.

[23]  Lihi Zelnik-Manor,et al.  ASAP: Architecture Search, Anneal and Prune , 2019, AISTATS.

[24]  Xiangyu Zhang,et al.  Single Path One-Shot Neural Architecture Search with Uniform Sampling , 2019, ECCV.

[25]  Xiangyu Zhang,et al.  Neural Architecture Search with Random Labels , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Xiaopeng Zhang,et al.  PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search , 2020, ICLR.

[27]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[29]  Cho-Jui Hsieh,et al.  Rethinking Architecture Selection in Differentiable NAS , 2021, ICLR.

[30]  Jonathan Gordon,et al.  Probabilistic Neural Architecture Search , 2019, ArXiv.

[31]  Mehrtash Harandi,et al.  Hierarchical Neural Architecture Search for Deep Stereo Matching , 2020, NeurIPS.

[32]  Chenxi Liu,et al.  Are Labels Necessary for Neural Architecture Search? , 2020, ECCV.

[33]  Xiangning Chen,et al.  Stabilizing Differentiable Architecture Search via Perturbation-based Regularization , 2020, ICML.

[34]  Raquel Urtasun,et al.  Graph HyperNetworks for Neural Architecture Search , 2018, ICLR.

[35]  Rongrong Ji,et al.  Multinomial Distribution Learning for Effective Neural Architecture Search , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Martin Jaggi,et al.  Evaluating the Search Phase of Neural Architecture Search , 2019, ICLR.

[37]  Xiaojun Chang,et al.  Block-Wisely Supervised Neural Architecture Search With Knowledge Distillation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Ramesh Raskar,et al.  Accelerating Neural Architecture Search using Performance Prediction , 2017, ICLR.

[39]  Ameet Talwalkar,et al.  Random Search and Reproducibility for Neural Architecture Search , 2019, UAI.

[40]  Qi Tian,et al.  Progressive Differentiable Architecture Search: Bridging the Depth Gap Between Search and Evaluation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Quoc V. Le,et al.  Understanding and Simplifying One-Shot Architecture Search , 2018, ICML.

[42]  R. Venkatesh Babu,et al.  Generalized Dropout , 2016, ArXiv.

[43]  Xiaojun Chang,et al.  BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Liang Lin,et al.  SNAS: Stochastic Neural Architecture Search , 2018, ICLR.

[45]  Yi Yang,et al.  Searching for a Robust Neural Architecture in Four GPU Hours , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Shifeng Zhang,et al.  DARTS+: Improved Differentiable Architecture Search with Early Stopping , 2019, ArXiv.

[47]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[48]  Roger B. Grosse,et al.  Picking Winning Tickets Before Training by Preserving Gradient Flow , 2020, ICLR.

[49]  Huiqi Li,et al.  Overcoming Multi-Model Forgetting in One-Shot NAS With Diversity Maximization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Zhihui Li,et al.  A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions , 2020, ArXiv.