论文信息 - Unifying and Boosting Gradient-Based Training-Free Neural Architecture Search

Unifying and Boosting Gradient-Based Training-Free Neural Architecture Search

Neural architecture search (NAS) has gained immense popularity owing to its ability to automate neural architecture design. A number of trainingfree metrics are recently proposed to realize NAS without training, hence making NAS more scalable. Despite their competitive empirical performances, a unified theoretical understanding of these training-free metrics is lacking. As a consequence, (a) the relationships among these metrics are unclear, (b) there is no theoretical guarantee for their empirical performances and transferability, and (c) there may exist untapped potential in training-free NAS, which can be unveiled through a unified theoretical understanding. To this end, this paper presents a unified theoretical analysis of gradient-based training-free NAS, which allows us to (a) theoretically study their relationships, (b) theoretically guarantee their generalization performances and transferability, and (c) exploit our unified theoretical understanding to develop a novel framework named hybrid NAS (HNAS) which consistently boosts training-free NAS in a principled way. Interestingly, HNAS is able to enjoy the advantages of both training-free (i.e., superior search efficiency) and training-based (i.e., remarkable search effectiveness) NAS, which we have demonstrated through extensive experiments.

Bryan Kian Hsiang Low | Zhongxiang Dai | Yao Shu | Zhaoxuan Wu

[1] Roger B. Grosse,et al. Picking Winning Tickets Before Training by Preserving Gradient Flow , 2020, ICLR.

[2] Philip H. S. Torr,et al. SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.

[3] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.

[4] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Felix Stahlberg. Neural Machine Translation: A Review and Survey , 2019 .

[6] Aaron Klein,et al. NAS-Bench-101: Towards Reproducible Neural Architecture Search , 2019, ICML.

[7] Zhanxing Zhu,et al. Efficient Neural Architecture Search via Proximal Iterations , 2020, AAAI.

[8] Etai Littwin,et al. Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics , 2021, ICML.

[9] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[10] Bo Chen,et al. MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Alok Aggarwal,et al. Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[12] Hongxia Yang,et al. KNAS: Green Neural Architecture Search , 2021, ICML.

[13] Xindong Wu,et al. Object Detection With Deep Learning: A Review , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[14] Shaofeng Cai,et al. Understanding Architectures Learnt by Cell-based Neural Architecture Search , 2020, ICLR.

[15] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[17] Bo Zhang,et al. DARTS-: Robustly Stepping out of Performance Collapse Without Indicators , 2020, ArXiv.

[18] Amos Storkey,et al. Neural Architecture Search without Training , 2021, ICML.

[19] Yi Yang,et al. NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search , 2020, ICLR.

[20] Li Fei-Fei,et al. Progressive Neural Architecture Search , 2017, ECCV.

[21] Thomas Brox,et al. Understanding and Robustifying Differentiable Architecture Search , 2020, ICLR.

[22] Liang Lin,et al. SNAS: Stochastic Neural Architecture Search , 2018, ICLR.

[23] Yi Yang,et al. Searching for a Robust Neural Architecture in Four GPU Hours , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Frank Hutter,et al. A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets , 2017, ArXiv.

[25] Quoc V. Le,et al. Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[26] Xinyu Gong,et al. Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective , 2021, ICLR.

[27] Ruosong Wang,et al. On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.

[28] Song Han,et al. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[29] Xiangning Chen,et al. Stabilizing Differentiable Architecture Search via Perturbation-based Regularization , 2020, ICML.

[30] 俊一甘利. 5分で分かる!? 有名論文ナナメ読み：Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .

[31] Graham W. Taylor,et al. Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[32] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.

[33] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[34] Nicholas D. Lane,et al. Zero-Cost Proxies for Lightweight NAS , 2021, ICLR.

[35] Qi Tian,et al. Progressive Differentiable Architecture Search: Bridging the Depth Gap Between Search and Evaluation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36] Pranjal Awasthi,et al. On the Rademacher Complexity of Linear Hypothesis Sets , 2020, ArXiv.

[37] Xiangyu Zhang,et al. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[38] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Beng Chin Ooi,et al. NASI: Label- and Data-agnostic Neural Architecture Search at Initialization , 2021, ArXiv.

[40] Tie-Yan Liu,et al. Neural Architecture Optimization , 2018, NeurIPS.

[41] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[42] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[44] Yiming Yang,et al. DARTS: Differentiable Architecture Search , 2018, ICLR.

[45] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[46] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).