NPAS: A Compiler-aware Framework of Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration

With the increasing demand to efficiently deploy DNNs on mobile edge devices, it becomes much more important to reduce unnecessary computation and increase the execution speed. Prior methods towards this goal, including model compression and network architecture search (NAS), are largely performed independently, and do not fully consider compiler-level optimizations which is a must-do for mobile acceleration. In this work, we first propose (i) a general category of fine-grained structured pruning applicable to various DNN layers, and (ii) a comprehensive, compiler automatic code generation framework supporting different DNNs and different pruning schemes, which bridge the gap of model compression and NAS. We further propose NPAS, a compiler-aware unified network pruning and architecture search. To deal with large search space, we propose a meta-modeling procedure based on reinforcement learning with fast evaluation and Bayesian optimization, ensuring the total number of training epochs comparable with representative NAS frameworks. Our framework achieves 6.7ms, 5.9ms, and 3.9ms ImageNet inference times with 78.2%, 75% (MobileNet-V3 level), and 71% (MobileNet-V2 level) Top-1 accuracy respectively on an off-the-shelf mobile phone, consistently outperforming prior work.

[1]  M. Savvides,et al.  MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks , 2020, ArXiv.

[2]  Xiaowen Dong,et al.  Neural Architecture Search using Bayesian Optimisation with Weisfeiler-Lehman Kernel , 2020, ArXiv.

[3]  Tao Huang,et al.  GreedyNAS: Towards Fast One-Shot NAS With Greedy Supernet , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yifan Gong,et al.  RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[5]  N. H. Timm Inverted , 2020, Definitions.

[6]  Yanzhi Wang,et al.  An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices , 2020, ECCV.

[7]  Yanzhi Wang,et al.  PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning , 2020, ASPLOS.

[8]  Jun Wu,et al.  Progressive DARTS: Bridging the Optimization Gap for NAS in the Wild , 2019, International Journal of Computer Vision.

[9]  Xiangxiang Chu,et al.  Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search , 2019, ECCV.

[10]  Yanzhi Wang,et al.  PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices , 2019, AAAI.

[11]  Yanzhi Wang,et al.  Tiny but Accurate: A Pruned, Quantized and Optimized Memristor Crossbar Framework for Ultra Efficient DNN Implementation , 2019, 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC).

[12]  Xiaopeng Zhang,et al.  PC-DARTS: Partial Channel Connections for Memory-Efficient Differentiable Architecture Search , 2019, ArXiv.

[13]  Jieping Ye,et al.  AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates , 2019, AAAI.

[14]  Bo Zhang,et al.  FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search , 2019, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Qian Zhang,et al.  Densely Connected Search Space for More Flexible Neural Architecture Search , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Baoyuan Wu,et al.  Compressing Convolutional Neural Networks via Factorized Convolutional Filters , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Bingbing Ni,et al.  Variational Convolutional Neural Network Pruning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[19]  Yi Yang,et al.  Network Pruning via Transformable Architecture Search , 2019, NeurIPS.

[20]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Xiangyu Zhang,et al.  Single Path One-Shot Neural Architecture Search with Uniform Sampling , 2019, ECCV.

[22]  Lei Yang,et al.  Accuracy vs. Efficiency: Achieving Both through FPGA-Implementation Aware Neural Architecture Search , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[23]  Alok Aggarwal,et al.  Aging Evolution for Image Classifier Architecture Search , 2019, AAAI 2019.

[24]  Nando de Freitas,et al.  Bayesian Optimization in AlphaGo , 2018, ArXiv.

[25]  Yuandong Tian,et al.  FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ping Liu,et al.  Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jing Liu,et al.  Discrimination-aware Channel Pruning for Deep Neural Networks , 2018, NeurIPS.

[28]  Trevor Darrell,et al.  Rethinking the Value of Network Pruning , 2018, ICLR.

[29]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[30]  Yiran Chen,et al.  2PFPCE: Two-Phase Filter Pruning Based on Conditional Entropy , 2018, ArXiv.

[31]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Quoc V. Le,et al.  Understanding and Simplifying One-Shot Architecture Search , 2018, ICML.

[33]  Houqiang Li,et al.  Improving Deep Neural Network Sparsity through Decorrelation Regularization , 2018, IJCAI.

[34]  Yi Yang,et al.  Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks , 2018, IJCAI.

[35]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[36]  F. Hutter,et al.  Efficient Multi-Objective Neural Architecture Search via Lamarckian Evolution , 2018, ICLR.

[37]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[38]  Yanzhi Wang,et al.  Systematic Weight Pruning of DNNs using Alternating Direction Method of Multipliers , 2018, ICLR.

[39]  Kirthevasan Kandasamy,et al.  Neural Architecture Search with Bayesian Optimisation and Optimal Transport , 2018, NeurIPS.

[40]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[41]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[42]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[43]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Li Fei-Fei,et al.  Progressive Neural Architecture Search , 2017, ECCV.

[45]  Xuanzhe Liu,et al.  DeepCache: Principled Cache for Mobile Deep Vision , 2017, MobiCom.

[46]  Larry S. Davis,et al.  NISP: Pruning Networks Using Neuron Importance Score Propagation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Oriol Vinyals,et al.  Hierarchical Representations for Efficient Architecture Search , 2017, ICLR.

[48]  Kristian Kersting,et al.  Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[49]  Wei Wu,et al.  Practical Block-Wise Neural Network Architecture Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Theodore Lim,et al.  SMASH: One-Shot Model Architecture Search through HyperNetworks , 2017, ICLR.

[51]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[53]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[54]  Yong Yu,et al.  Efficient Architecture Search by Network Transformation , 2017, AAAI.

[55]  Rajesh Krishna Balan,et al.  DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications , 2017, MobiSys.

[56]  Diego Klabjan,et al.  Improving the Expected Improvement Algorithm , 2017, NIPS.

[57]  Song Han,et al.  Exploring the Regularity of Sparse Structure in Convolutional Neural Networks , 2017, ArXiv.

[58]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[59]  Alan L. Yuille,et al.  Genetic CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[60]  Nigel P. Duffy,et al.  Evolving Deep Neural Networks , 2017, Artificial Intelligence in the Age of Neural Networks and Brain Computing.

[61]  Aaron Klein,et al.  Towards Automatically-Tuned Neural Networks , 2016, AutoML@ICML.

[62]  Shaohan Hu,et al.  DeepSense: A Unified Deep Learning Framework for Time-Series Mobile Sensing Data Processing , 2016, WWW.

[63]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[64]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[65]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[66]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[67]  Alec Wolman,et al.  MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints , 2016, MobiSys.

[68]  Aaron Klein,et al.  Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.

[69]  Nicholas D. Lane,et al.  DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices , 2016, 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[70]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[71]  Nicholas D. Lane,et al.  DeepEar: robust smartphone audio sensing in unconstrained acoustic environments using deep learning , 2015, UbiComp.

[72]  Frank Hutter,et al.  Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves , 2015, IJCAI.

[73]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[74]  Marc G. Bellemare,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[75]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[76]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[77]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[78]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[79]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[80]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[81]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[82]  J. H. Metzen,et al.  Deep Uncertainty Estimation for Model-based Neural Architecture Search , 2019 .

[83]  Haichen Shen,et al.  TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018 .

[84]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .