DARTS: Differentiable Architecture Search

This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Unlike conventional approaches of applying evolution or reinforcement learning over a discrete and non-differentiable search space, our method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent. Extensive experiments on CIFAR-10, ImageNet, Penn Treebank and WikiText-2 show that our algorithm excels in discovering high-performance convolutional architectures for image classification and recurrent architectures for language modeling, while being orders of magnitude faster than state-of-the-art non-differentiable techniques. Our implementation has been made publicly available to facilitate further research on efficient architecture search algorithms.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Geoffrey J. Gordon,et al.  DeepArchitect: Automatically Designing and Training Deep Architectures , 2017, ArXiv.

[3]  Hakan Inan,et al.  Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.

[4]  Ramesh Raskar,et al.  Accelerating Neural Architecture Search using Performance Prediction , 2017, ICLR.

[5]  Steve Renals,et al.  Dynamic Evaluation of Neural Sequence Models , 2017, ICML.

[6]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[7]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[8]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[9]  H. Stackelberg,et al.  Marktform und Gleichgewicht , 1935 .

[10]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[11]  Jakob Verbeek,et al.  Convolutional Neural Fabrics , 2016, NIPS.

[12]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[13]  Terry L. Friesz,et al.  Hierarchical optimization: An introduction , 1992, Ann. Oper. Res..

[14]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[15]  Theodore Lim,et al.  SMASH: One-Shot Model Architecture Search through HyperNetworks , 2017, ICLR.

[16]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[17]  Oriol Vinyals,et al.  Hierarchical Representations for Efficient Architecture Search , 2017, ICLR.

[18]  Chris Dyer,et al.  On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.

[19]  Li Fei-Fei,et al.  Progressive Neural Architecture Search , 2017, ECCV.

[20]  Frank Hutter,et al.  Simple And Efficient Architecture Search for Convolutional Neural Networks , 2017, ICLR.

[21]  Patrice Marcotte,et al.  An overview of bilevel optimization , 2007, Ann. Oper. Res..

[22]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Wei Wu,et al.  Practical Block-Wise Neural Network Architecture Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Dawn Xiaodong Song,et al.  Differentiable Neural Network Architecture Search , 2018, ICLR.

[25]  Nicolas Usunier,et al.  Improving Neural Language Models with a Continuous Cache , 2016, ICLR.

[26]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[27]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[28]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[29]  Yong Yu,et al.  Efficient Architecture Search by Network Transformation , 2017, AAAI.

[30]  Fabian Pedregosa,et al.  Hyperparameter optimization with approximate gradient , 2016, ICML.

[31]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[33]  Ludovic Denoyer,et al.  Learning Time/Memory-Efficient Deep Architectures with Budgeted Super Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Tapani Raiko,et al.  Scalable Gradient-Based Tuning of Continuous Regularization Hyperparameters , 2015, ICML.

[35]  Jürgen Schmidhuber,et al.  Recurrent Highway Networks , 2016, ICML.

[36]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Kirthevasan Kandasamy,et al.  Neural Architecture Search with Bayesian Optimisation and Optimal Transport , 2018, NeurIPS.

[39]  Lorenzo Torresani,et al.  Connectivity Learning in Multi-Branch Networks , 2017, ArXiv.

[40]  Quoc V. Le,et al.  Understanding and Simplifying One-Shot Architecture Search , 2018, ICML.

[41]  Ruslan Salakhutdinov,et al.  Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.

[42]  Paolo Frasconi,et al.  Bilevel Programming for Hyperparameter Optimization and Meta-Learning , 2018, ICML.

[43]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[44]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[45]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[46]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.