Joint Multi-Dimension Pruning

We present joint multi-dimension pruning (named as JointPruning), a new perspective of pruning a network on three crucial aspects: spatial, depth and channel simultaneously. The joint strategy enables to search a better status than previous studies that focused on individual dimension solely, as our method is optimized collaboratively across the three dimensions in a single end-to-end training. Moreover, each dimension that we consider can promote to get better performance through colluding with the other two. Our method is realized by the adapted stochastic gradient estimation. Extensive experiments on large-scale ImageNet dataset across a variety of network architectures MobileNet V1&V2 and ResNet demonstrate the effectiveness of our proposed method. For instance, we achieve significant margins of 2.5% and 2.6% improvement over the state-of-the-art approach on the already compact MobileNet V1&V2 under an extremely large compression ratio.

[1]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[2]  Thomas Bäck,et al.  A Survey of Evolution Strategies , 1991, ICGA.

[3]  Bichen Wu,et al.  Efficient Deep Neural Networks , 2019, ArXiv.

[4]  Jianxin Wu,et al.  AutoPruner: An End-to-End Trainable Filter Pruning Method for Efficient Deep Model Inference , 2018, Pattern Recognit..

[5]  Kaiming He,et al.  Exploring Randomly Wired Neural Networks for Image Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[7]  Richard E. Turner,et al.  Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning , 2017, NIPS.

[8]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Bo Chen,et al.  NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications , 2018, ECCV.

[10]  Chuang Gan,et al.  Once for All: Train One Network and Specialize it for Efficient Deployment , 2019, ICLR.

[11]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[12]  Rongrong Ji,et al.  Accelerating Convolutional Networks via Global & Dynamic Filter Pruning , 2018, IJCAI.

[13]  Liang Lin,et al.  SNAS: Stochastic Neural Architecture Search , 2018, ICLR.

[14]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Jie Liu,et al.  Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 Hours , 2019, ECML/PKDD.

[17]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[18]  Hans-Paul Schwefel,et al.  Evolution strategies – A comprehensive introduction , 2002, Natural Computing.

[19]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[20]  Ping Liu,et al.  Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[22]  Pavlo Molchanov,et al.  Importance Estimation for Neural Network Pruning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[25]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[26]  Jungong Han,et al.  Centripetal SGD for Pruning Very Deep Convolutional Networks With Complicated Structure , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[28]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[29]  Jing Liu,et al.  Effective Training of Convolutional Neural Networks With Low-Bitwidth Weights and Activations , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Niraj K. Jha,et al.  ChamNet: Towards Efficient Network Design Through Platform-Aware Model Adaptation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Koray Kavukcuoglu,et al.  PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.

[32]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[33]  Amos Storkey,et al.  A Closer Look at Structured Pruning for Neural Network Compression , 2018 .

[34]  Liujuan Cao,et al.  Towards Optimal Structured CNN Pruning via Generative Adversarial Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Mingkui Tan,et al.  Training Quantized Neural Networks With a Full-Precision Auxiliary Module , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Ning Xu,et al.  Slimmable Neural Networks , 2018, ICLR.

[37]  Xiangyu Zhang,et al.  MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Yuandong Tian,et al.  FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Quoc V. Le,et al.  Understanding and Simplifying One-Shot Architecture Search , 2018, ICML.

[40]  Jing Liu,et al.  Discrimination-aware Channel Pruning for Deep Neural Networks , 2018, NeurIPS.

[41]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[42]  Shuicheng Yan,et al.  Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Xiangyu Zhang,et al.  Single Path One-Shot Neural Architecture Search with Uniform Sampling , 2019, ECCV.

[44]  Bingbing Ni,et al.  Variational Convolutional Neural Network Pruning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[46]  Qi Tian,et al.  Accelerate CNN via Recursive Bayesian Pruning , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[47]  Koray Kavukcuoglu,et al.  Combining policy gradient and Q-learning , 2016, ICLR.

[48]  Gregory J. Wolff,et al.  Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.

[49]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[50]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[53]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[54]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[55]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Tom Schaul,et al.  Stochastic search using the natural gradient , 2009, ICML '09.

[57]  Amos J. Storkey,et al.  Pruning neural networks: is it time to nip it in the bud? , 2018, ArXiv.

[58]  Naiyan Wang,et al.  Data-Driven Sparse Structure Selection for Deep Neural Networks , 2017, ECCV.

[59]  Diana Marculescu,et al.  Towards Efficient Model Compression via Learned Global Ranking , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[61]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[62]  Yi Yang,et al.  Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks , 2018, IJCAI.

[63]  Frank Sehnke,et al.  Parameter-exploring policy gradients , 2010, Neural Networks.

[64]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.