AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates

Structured weight pruning is a representative model compression technique of DNNs to reduce the storage and computation requirements and accelerate inference. An automatic hyperparameter determination process is necessary due to the large number of flexible hyperparameters. This work proposes AutoCompress, an automatic structured pruning framework with the following key performance improvements: (i) effectively incorporate the combination of structured pruning schemes in the automatic process; (ii) adopt the state-of-art ADMM-based structured weight pruning as the core algorithm, and propose an innovative additional purification step for further weight reduction without accuracy loss; and (iii) develop effective heuristic search method enhanced by experience-based guided search, replacing the prior deep reinforcement learning technique which has underlying incompatibility with the target pruning problem. Extensive experiments on CIFAR-10 and ImageNet datasets demonstrate that AutoCompress is the key to achieve ultra-high pruning rates on the number of weights and FLOPs that cannot be achieved before. As an example, AutoCompress outperforms the prior work on automatic model compression by up to 33× in pruning rate (120× reduction in the actual parameter count) under the same accuracy. Significant inference speedup has been observed from the AutoCompress framework on actual measurements on smartphone. We release models of this work at anonymous link: http://bit.ly/2VZ63dS.

[1]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[3]  Li Fei-Fei,et al.  Progressive Neural Architecture Search , 2017, ECCV.

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  s-taiji Dual Averaging and Proximal Gradient Descent for Online Alternating Direction Multiplier Method , 2013 .

[6]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[8]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[9]  Yiran Chen,et al.  2PFPCE: Two-Phase Filter Pruning Based on Conditional Entropy , 2018, ArXiv.

[10]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[13]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[14]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[15]  Richard S. Sutton,et al.  Sample-based learning and search with permanent and transient memories , 2008, ICML '08.

[16]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[17]  Haichen Shen,et al.  TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[18]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[20]  Jiayu Li,et al.  Progressive Weight Pruning of Deep Neural Networks using ADMM , 2018, ArXiv.

[21]  Mingjie Sun,et al.  Rethinking the Value of Network Pruning , 2018, ICLR.

[22]  Quoc V. Le,et al.  Large-Scale Evolution of Image Classifiers , 2017, ICML.

[23]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[24]  Igor Carron,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .

[25]  Zhi-Quan Luo,et al.  Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems , 2014, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Larry S. Davis,et al.  NISP: Pruning Networks Using Neuron Importance Score Propagation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Shimon Whiteson,et al.  Protecting against evaluation overfitting in empirical reinforcement learning , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[28]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[29]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[30]  Shenghuo Zhu,et al.  Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM , 2017, AAAI.

[31]  Jiayu Li,et al.  ADAM-ADMM: A Unified, Systematic Framework of Structured Weight Pruning for DNNs , 2018, ArXiv.

[32]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[33]  Jianxin Wu,et al.  An Entropy-based Pruning Method for CNN Compression , 2017, ArXiv.