Efficient Architecture Search for Diverse Tasks

While neural architecture search (NAS) has enabled automated machine learning (AutoML) for well-researched areas, its application to tasks beyond computer vision is still under-explored. As less-studied domains are precisely those where we expect AutoML to have the greatest impact, in this work we study NAS for efficiently solving diverse problems. Seeking an approach that is fast, simple, and broadly applicable, we fix a standard convolutional network (CNN) topology and propose to search for the right kernel sizes and dilations its operations should take on. This dramatically expands the model's capacity to extract features at multiple resolutions for different types of data while only requiring search over the operation space. To overcome the efficiency challenges of naive weight-sharing in this search space, we introduce DASH, a differentiable NAS algorithm that computes the mixture-of-operations using the Fourier diagonalization of convolution, achieving both a better asymptotic complexity and an up-to-10x search time speedup in practice. We evaluate DASH on ten tasks spanning a variety of application domains such as PDE solving, protein folding, and heart disease detection. DASH outperforms state-of-the-art AutoML methods in aggregate, attaining the best-known automated performance on seven tasks. Meanwhile, on six of the ten tasks, the combined search and retraining time is less than 2x slower than simply training a CNN backbone that is far less accurate.

[1]  Trevor Darrell,et al.  A ConvNet for the 2020s , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  M. Khodak,et al.  NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks , 2021, NeurIPS.

[3]  Olivier J. H'enaff,et al.  Perceiver IO: A General Architecture for Structured Inputs & Outputs , 2021, ICLR.

[4]  X. Serra,et al.  FSD50K: An Open Dataset of Human-Labeled Sound Events , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5]  Tri Dao,et al.  Rethinking Neural Operations for Diverse Tasks , 2021, NeurIPS.

[6]  P. Abbeel,et al.  Pretrained Transformers as Universal Computation Engines , 2021, ArXiv.

[7]  Ningning Ma,et al.  RepVGG: Making VGG-style ConvNets Great Again , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[9]  Nikola B. Kovachki,et al.  Fourier Neural Operator for Parametric Partial Differential Equations , 2020, ICLR.

[10]  Bin Li,et al.  Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.

[11]  Zijun Zhang,et al.  An automated framework for efficiently designing deep convolutional neural networks in genomics , 2020, Nature Machine Intelligence.

[12]  Maria-Florina Balcan,et al.  Geometry-Aware Gradient Algorithms for Neural Architecture Search , 2020, ICLR.

[13]  Samin Ishtiaq,et al.  NAS-Bench-ASR: Reproducible Neural Architecture Search for Speech Recognition , 2021, ICLR.

[14]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Jimeng Sun,et al.  HOLMES: Health OnLine Model Ensemble Serving for Deep Learning Models in Intensive Care Units , 2020, KDD.

[16]  Yonggang Hu,et al.  MergeNAS: Merge Operations into One for Differentiable Architecture Search , 2020, IJCAI.

[17]  John Santerre,et al.  SMU Data sEMG Gesture Recognition With a Simple Model of Attention sEMG Gesture Recognition With a Simple Model of Attention sEMG Gesture Recognition with a Simple Model of Attention , 2020 .

[18]  Atri Rudra,et al.  Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps , 2020, ICLR.

[19]  Quoc V. Le,et al.  AutoML-Zero: Evolving Machine Learning Algorithms From Scratch , 2020, ICML.

[20]  Geoffrey I. Webb,et al.  ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels , 2019, Data Mining and Knowledge Discovery.

[21]  F. Hutter,et al.  Understanding and Robustifying Differentiable Architecture Search , 2019, ICLR.

[22]  Geoffrey I. Webb,et al.  InceptionTime: Finding AlexNet for time series classification , 2019, Data Mining and Knowledge Discovery.

[23]  Lingxi Xie,et al.  PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search , 2019, ICLR.

[24]  Qian Zhang,et al.  Densely Connected Search Space for More Flexible Neural Architecture Search , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Kevin G. Jamieson,et al.  A System for Massively Parallel Hyperparameter Tuning , 2018, MLSys.

[26]  Keming Zhang,et al.  deepCR: Cosmic Ray Rejection with Deep Learning , 2019, J. Open Source Softw..

[27]  Jie Liu,et al.  Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 Hours , 2019, ECML/PKDD.

[28]  Badri Adhikari,et al.  DEEPCON: Protein Contact Prediction using Dilated Convolutional Neural Networks with Dropout , 2019, bioRxiv.

[29]  Quoc V. Le,et al.  The Evolved Transformer , 2019, ICML.

[30]  Li Fei-Fei,et al.  Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Sheng Tang,et al.  Tree-Structured Kronecker Convolutional Network for Semantic Segmentation , 2018, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[32]  Liang Lin,et al.  SNAS: Stochastic Neural Architecture Search , 2018, ICLR.

[33]  Vladlen Koltun,et al.  Trellis Networks for Sequence Modeling , 2018, ICLR.

[34]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[35]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[36]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[37]  George Papandreou,et al.  Searching for Efficient Multi-Scale Architectures for Dense Image Prediction , 2018, NeurIPS.

[38]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[39]  Masanori Suganuma,et al.  Exploiting the Potential of Standard Convolutional Autoencoders for Image Restoration by Evolutionary Search , 2018, ICML.

[40]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[41]  Max Welling,et al.  Spherical CNNs , 2018, ICLR.

[42]  Li Fei-Fei,et al.  Progressive Neural Architecture Search , 2017, ECCV.

[43]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[46]  Xiangyu Zhang,et al.  Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Zhiting Hu,et al.  Improved Variational Autoencoders for Text Modeling using Dilated Convolutions , 2017, ICML.

[48]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[49]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[51]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[52]  Changhu Wang,et al.  Network Morphism , 2016, ICML.

[53]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[54]  Tianqi Chen,et al.  Net2Net: Accelerating Learning via Knowledge Transfer , 2015, ICLR.

[55]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[56]  Jorge J. Moré,et al.  Benchmarking optimization software with performance profiles , 2001, Math. Program..