Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration YIFAN GONG∗ and GENG YUAN∗, Northeastern University, USA ZHENG ZHAN, Northeastern University, USA WEI NIU, College of William and Mary, USA ZHENGANG LI, Northeastern University, USA PU ZHAO, Northeastern University, USA YUXUAN CAI, Northeastern University, USA SIJIA LIU,Michigan State University, USA BIN REN, College of William and Mary, USA XUE LIN, Northeastern University, USA XULONG TANG, University of Pittsburgh, USA YANZHI WANG, Northeastern University, USA

[1]  Yanzhi Wang,et al.  Achieving Real-Time LiDAR 3D Object Detection on a Mobile Device , 2020, ArXiv.

[2]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3]  Xuanzhe Liu,et al.  DeepCache: Principled Cache for Mobile Deep Vision , 2017, MobiCom.

[4]  Hong-Yuan Mark Liao,et al.  YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[5]  Niraj K. Jha,et al.  NeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm , 2017, IEEE Transactions on Computers.

[6]  Yiran Chen,et al.  2PFPCE: Two-Phase Filter Pruning Based on Conditional Entropy , 2018, ArXiv.

[7]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[9]  Wei Wang,et al.  Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks , 2020, ICLR.

[10]  Yanzhi Wang,et al.  An Ultra-Efficient Memristor-Based DNN Framework with Structured Weight Pruning and Quantization Using ADMM , 2019, 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[14]  Nicholas D. Lane,et al.  DeepEar: robust smartphone audio sensing in unconstrained acoustic environments using deep learning , 2015, UbiComp.

[15]  Xue Lin,et al.  Real-Time Mobile Acceleration of DNNs: From Computer Vision to Medical Applications , 2021, 2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC).

[16]  Hang Liu,et al.  FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).

[17]  Wei Niu,et al.  PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices , 2020, AAAI.

[18]  Yifan Gong,et al.  RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[19]  Jing Liu,et al.  Discrimination-aware Channel Pruning for Deep Neural Networks , 2018, NeurIPS.

[20]  Niraj K. Jha,et al.  Grow and Prune Compact, Fast, and Accurate LSTMs , 2018, IEEE Transactions on Computers.

[21]  K. Ota,et al.  Deep Learning for Mobile Multimedia , 2017, ACM Trans. Multim. Comput. Commun. Appl..

[22]  Jieping Ye,et al.  AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates , 2020, AAAI.

[23]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[24]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[25]  Nicholas D. Lane,et al.  Squeezing Deep Learning into Mobile and Embedded Devices , 2017, IEEE Pervasive Computing.

[26]  Yanzhi Wang,et al.  A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers , 2018, ECCV.

[27]  Xiaolong Ma,et al.  MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge , 2021, NeurIPS.

[28]  Houqiang Li,et al.  Improving Deep Neural Network Sparsity through Decorrelation Regularization , 2018, IJCAI.

[29]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[31]  Bo Chen,et al.  NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications , 2018, ECCV.

[32]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[33]  Yanzhi Wang,et al.  Non-structured DNN Weight Pruning Considered Harmful , 2019, ArXiv.

[34]  Shirish Tatikonda,et al.  On optimizing machine learning workloads via kernel fusion , 2015, PPoPP.

[35]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[36]  Hamed Haddadi,et al.  Deep Learning in Mobile and Wireless Networking: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[37]  G. Evans,et al.  Learning to Optimize , 2008 .

[38]  Nando de Freitas,et al.  Bayesian Optimization in AlphaGo , 2018, ArXiv.

[39]  Stratis Ioannidis,et al.  Radio Frequency Fingerprinting on the Edge , 2022, IEEE Transactions on Mobile Computing.

[40]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[42]  Yuandong Tian,et al.  FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Jiayu Li,et al.  ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Methods of Multipliers , 2018, ASPLOS.

[44]  Jiayu Li,et al.  ADAM-ADMM: A Unified, Systematic Framework of Structured Weight Pruning for DNNs , 2018, ArXiv.

[45]  Gopalakrishna Hegde,et al.  CaffePresso: An optimized library for Deep Learning on embedded accelerator-based platforms , 2016, 2016 International Conference on Compliers, Architectures, and Sythesis of Embedded Systems (CASES).

[46]  Yanzhi Wang,et al.  Tiny but Accurate: A Pruned, Quantized and Optimized Memristor Crossbar Framework for Ultra Efficient DNN Implementation , 2019, 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC).

[47]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[49]  Xiangyu Zhang,et al.  MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50]  Yifan Gong,et al.  SS-Auto: A Single-Shot, Automatic Structured Weight Pruning Framework of DNNs with Ultra-High Efficiency , 2020, ArXiv.

[51]  Yanzhi Wang,et al.  An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices , 2020, ECCV.

[52]  Mingjie Sun,et al.  Rethinking the Value of Network Pruning , 2018, ICLR.

[53]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[54]  Alan Edelman,et al.  Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[55]  Nicholas D. Lane,et al.  DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices , 2016, 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[56]  Shaohan Hu,et al.  DeepSense: A Unified Deep Learning Framework for Time-Series Mobile Sensing Data Processing , 2016, WWW.

[57]  Berthold Reinwald,et al.  On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML , 2018, Proc. VLDB Endow..

[58]  Yanzhi Wang,et al.  PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning , 2020, ASPLOS.

[59]  Yi Yang,et al.  Network Pruning via Transformable Architecture Search , 2019, NeurIPS.

[60]  Xuehai Qian,et al.  Non-Structured DNN Weight Pruning--Is It Beneficial in Any Platform? , 2021, IEEE transactions on neural networks and learning systems.

[61]  Rajesh Krishna Balan,et al.  DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications , 2017, MobiSys.

[62]  Ping Liu,et al.  Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[64]  Caiwen Ding,et al.  A SOT-MRAM-based Processing-In-Memory Engine for Highly Compressed DNN Implementation , 2019, ArXiv.

[65]  Aaron Klein,et al.  Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.

[66]  Larry S. Davis,et al.  NISP: Pruning Networks Using Neuron Importance Score Propagation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[67]  Alec Wolman,et al.  MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints , 2016, MobiSys.

[68]  Wei Wu,et al.  Practical Block-Wise Neural Network Architecture Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[69]  Yanzhi Wang,et al.  Achieving Real-Time Execution of Transformer-based Large-scale Models on Mobile with Compiler-aware Neural Architecture Optimization , 2020, ArXiv.

[70]  Baoyuan Wu,et al.  Compressing Convolutional Neural Networks via Factorized Convolutional Filters , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Yifan Gong,et al.  BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted Regularization Method , 2020, ArXiv.

[72]  Niraj K. Jha,et al.  ChamNet: Towards Efficient Network Design Through Platform-Aware Model Adaptation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Haichen Shen,et al.  TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[74]  Yanzhi Wang,et al.  ResNet Can Be Pruned 60×: Introducing Network Purification and Unused Path Removal (P-RM) after Weight Pruning , 2019, 2019 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH).

[75]  Bingbing Ni,et al.  Variational Convolutional Neural Network Pruning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Yanzhi Wang,et al.  YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design , 2020, AAAI.

[77]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[78]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[79]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[80]  Wei Niu,et al.  Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture and Pruning Search , 2021, ArXiv.

[81]  Jieping Ye,et al.  AutoSlim: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates , 2019, ArXiv.