LANA: Latency Aware Network Acceleration

We introduce latency-aware network acceleration (LANA) – an approach that builds on neural architecture search techniques and teacher-student distillation to accelerate neural networks. LANA consists of two phases: in the first phase, it trains many alternative operations for every layer of the teacher network using layer-wise feature map distillation. In the second phase, it solves the combinatorial selection of efficient operations using a novel constrained integer linear optimization (ILP) approach. ILP brings unique properties as it (i) performs NAS within a few seconds to minutes, (ii) easily satisfies budget constraints, (iii) works on the layergranularity, (iv) supports a huge search spaceOp10q, surpassing prior search approaches in efficacy and efficiency. In extensive experiments, we show that LANA yields efficient and accurate models constrained by a target latency budget, while being significantly faster than other techniques. We analyze three popular network architectures: EfficientNetV1, EfficientNetV2 and ResNeST, and achieve up to 3.0% accuracy improvement for all models when compressing larger models to the latency level of smaller models. LANA achieves significant speed-ups (up to 5ˆ) with minor to no accuracy drop on GPU and CPU. The code will be available soon.

[1]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[2]  Matthieu Cord,et al.  Going deeper with Image Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Larry S. Davis,et al.  NISP: Pruning Networks Using Neuron Importance Score Propagation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Xiaojun Chang,et al.  Block-Wisely Supervised Neural Architecture Search With Knowledge Distillation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Fabio Galasso,et al.  Adversarial Network Compression , 2018, ECCV Workshops.

[6]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[7]  Liang Lin,et al.  SNAS: Stochastic Neural Architecture Search , 2018, ICLR.

[8]  Ludovic Denoyer,et al.  Learning Time/Memory-Efficient Deep Architectures with Budgeted Super Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Wei Yi,et al.  Kernel Fusion: An Effective Method for Better Power Efficiency on Multithreaded GPU , 2010, 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing.

[10]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning , 2016, ArXiv.

[12]  Jose Javier Gonzalez Ortiz,et al.  What is the State of Neural Network Pruning? , 2020, MLSys.

[13]  Elad Eban,et al.  MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[15]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Kurt Keutzer,et al.  ZeroQ: A Novel Zero Shot Quantization Framework , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Asit K. Mishra,et al.  Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy , 2017, ICLR.

[18]  Michael W. Mahoney,et al.  Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2019, AAAI.

[19]  Quoc V. Le,et al.  MixConv: Mixed Depthwise Convolutional Kernels , 2019, BMVC.

[20]  Zhijian Liu,et al.  HAQ: Hardware-Aware Automated Quantization With Mixed Precision , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[22]  G. Hua,et al.  LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks , 2018, ECCV.

[23]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[24]  Mingjie Sun,et al.  Rethinking the Value of Network Pruning , 2018, ICLR.

[25]  Pieter Abbeel,et al.  Bottleneck Transformers for Visual Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[27]  Jungong Han,et al.  ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Song Han,et al.  Trained Ternary Quantization , 2016, ICLR.

[29]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[30]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yang Zhao,et al.  Deep High-Resolution Representation Learning for Visual Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[34]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Pavlo Molchanov,et al.  Importance Estimation for Neural Network Pruning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Swagath Venkataramani,et al.  PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.

[37]  Ping Liu,et al.  Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Dmitry P. Vetrov,et al.  Structured Bayesian Pruning via Log-Normal Multiplicative Noise , 2017, NIPS.

[39]  Jian Yang,et al.  Selective Kernel Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Raghuraman Krishnamoorthi,et al.  Quantizing deep convolutional networks for efficient inference: A whitepaper , 2018, ArXiv.

[41]  Qilong Wang,et al.  ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[43]  R. Venkatesh Babu,et al.  Zero-Shot Knowledge Distillation in Deep Networks , 2019, ICML.

[44]  Yuandong Tian,et al.  FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Seong Joon Oh,et al.  Rethinking Spatial Dimensions of Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[46]  Quoc V. Le,et al.  Large-Scale Evolution of Image Classifiers , 2017, ICML.

[47]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[50]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[51]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[52]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[53]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[54]  Yi Yang,et al.  Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks , 2018, IJCAI.

[55]  Xiaogang Wang,et al.  Search to Distill: Pearls Are Everywhere but Not the Eyes , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  James Zijun Wang,et al.  Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers , 2018, ICLR.

[57]  Yves Chauvin,et al.  A Back-Propagation Algorithm with Optimal Use of Hidden Units , 1988, NIPS.

[58]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[59]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[60]  Eunhyeok Park,et al.  Value-aware Quantization for Training and Inference of Neural Networks , 2018, ECCV.

[61]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[62]  Song Han,et al.  ADC: Automated Deep Compression and Acceleration with Reinforcement Learning , 2018, ArXiv.

[63]  Trevor Darrell,et al.  Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[64]  Zhi Zhang,et al.  Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Mingxing Tan,et al.  EfficientNetV2: Smaller Models and Faster Training , 2021, ICML.

[66]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[67]  Zheng Xu,et al.  Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks , 2017, ICLR.

[68]  Victor S. Lempitsky,et al.  Fast ConvNets Using Group-Wise Brain Damage , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Irwan Bello LambdaNetworks: Modeling Long-Range Interactions Without Attention , 2021, ICLR.

[70]  Yuandong Tian,et al.  Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search , 2018, ArXiv.

[71]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[72]  Niraj K. Jha,et al.  ChamNet: Towards Efficient Network Design Through Platform-Aware Model Adaptation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Tijmen Blankevoort,et al.  Distilling Optimal Neural Networks: Rapid Search in Diverse Spaces , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[75]  Chuang Gan,et al.  Once for All: Train One Network and Specialize it for Efficient Deployment , 2019, ICLR.

[76]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[77]  Jan Kautz,et al.  UNAS: Differentiable Architecture Search Meets Reinforcement Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[80]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[81]  Ningning Ma,et al.  RepVGG: Making VGG-style ConvNets Great Again , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[82]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[83]  Ekin D. Cubuk,et al.  Revisiting ResNets: Improved Training and Scaling Strategies , 2021, NeurIPS.

[84]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[85]  Nikos Komodakis,et al.  DiracNets: Training Very Deep Neural Networks Without Skip-Connections , 2017, ArXiv.

[86]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[87]  Jun-Wei Hsieh,et al.  CSPNet: A New Backbone that can Enhance Learning Capability of CNN , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[88]  Yi Yang,et al.  Progressive Deep Neural Networks Acceleration via Soft Filter Pruning , 2018, ArXiv.

[89]  Bo Chen,et al.  NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications , 2018, ECCV.

[90]  Max Welling,et al.  Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[91]  Luciano Lavagno,et al.  Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs , 2018, FPGA.

[92]  Derek Hoiem,et al.  Dreaming to Distill: Data-Free Knowledge Transfer via DeepInversion , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[93]  Vineeth N. Balasubramanian,et al.  Deep Model Compression: Distilling Knowledge from Noisy Teachers , 2016, ArXiv.

[94]  Kurt Keutzer,et al.  HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).