DepGraph: Towards Any Structural Pruning

Structural pruning enables model acceleration by removing structurally-grouped parameters from neural networks. However, the parameter-grouping patterns vary widely across different models, making architecture-specific pruners, which rely on manually-designed grouping schemes, non-generalizable to new architectures. In this work, we study a highly-challenging yet barely-explored task, any structural pruning, to tackle general structural pruning of arbitrary architecture like CNNs, RNNs, GNNs and Transformers. The most prominent obstacle towards this goal lies in the structural coupling, which not only forces different layers to be pruned simultaneously, but also expects all removed parameters to be consistently unimportant, thereby avoiding structural issues and significant performance degradation after pruning. To address this problem, we propose a general and {fully automatic} method, \emph{Dependency Graph} (DepGraph), to explicitly model the dependency between layers and comprehensively group coupled parameters for pruning. In this work, we extensively evaluate our method on several architectures and tasks, including ResNe(X)t, DenseNet, MobileNet and Vision transformer for images, GAT for graph, DGCNN for 3D point cloud, alongside LSTM for language, and demonstrate that, even with a simple norm-based criterion, the proposed method consistently yields gratifying performances.

[1]  Xinchao Wang,et al.  Partial Network Cloning , 2023, ArXiv.

[2]  Xinchao Wang,et al.  Dataset Distillation: A Comprehensive Review , 2023, ArXiv.

[3]  Yue Bai,et al.  Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Pruning , 2023, ArXiv.

[4]  Xinchao Wang,et al.  Dataset Distillation via Factorization , 2022, NeurIPS.

[5]  Xinchao Wang,et al.  Deep Model Reassembly , 2022, NeurIPS.

[6]  Xinchao Wang,et al.  Factorizing Knowledge in Neural Networks , 2022, ECCV.

[7]  Yihong Xu,et al.  CP-ViT: Cascade Vision Transformer Pruning via Progressive Sparsity Prediction , 2022, ArXiv.

[8]  Dacheng Tao,et al.  Meta-Aggregator: Learning to Aggregate for 1-bit Graph Neural Networks , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Qingmin Liao,et al.  Group Fisher Pruning for Practical Network Compression , 2021, ICML.

[10]  Weidong Cai,et al.  Network Pruning via Performance Maximization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Zhenguo Li,et al.  Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  C. Glossner,et al.  Pruning and Quantization for Deep Neural Network Acceleration: A Survey , 2021, Neurocomputing.

[13]  Yulun Zhang,et al.  Neural Pruning via Growing Regularization , 2020, ICLR.

[14]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[15]  Jinwoo Shin,et al.  Layer-adaptive Sparsity for the Magnitude-based Pruning , 2020, ICLR.

[16]  Xiaofei He,et al.  Accelerate CNNs from Three Dimensions: A Comprehensive Pruning Framework , 2020, ICML.

[17]  Ekdeep Singh Lubana,et al.  A Gradient Flow Framework For Analyzing Network Pruning , 2020, ICLR.

[18]  Ji Liu,et al.  ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Y. Fu,et al.  Aligned Structured Sparsity Learning for Efficient Image Super-Resolution , 2021, NeurIPS.

[20]  Marcus Hutter,et al.  Logarithmic Pruning is All You Need , 2020, Neural Information Processing Systems.

[21]  Alexander M. Rush,et al.  Movement Pruning: Adaptive Sparsity by Fine-Tuning , 2020, NeurIPS.

[22]  Kaiming He,et al.  Designing Network Design Spaces , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  D. Tao,et al.  Distilling Knowledge From Graph Convolutional Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Rongrong Ji,et al.  HRank: Filter Pruning Using High-Rank Feature Map , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jinwoo Shin,et al.  Lookahead: a Far-Sighted Alternative of Magnitude-based Pruning , 2020, ICLR.

[26]  Jianxin Wu,et al.  Neural Network Pruning With Residual-Connections and Limited-Data , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Michael Carbin,et al.  Comparing Rewinding and Fine-tuning in Neural Network Pruning , 2019, ICLR.

[28]  Philip H. S. Torr,et al.  A Signal Propagation Perspective for Pruning Neural Networks at Initialization , 2019, ICLR.

[29]  Holger Fröning,et al.  Parameterized Structured Pruning for Deep Neural Networks , 2019, LOD.

[30]  Diana Marculescu,et al.  Towards Efficient Model Compression via Learned Global Ranking , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Yuheng Huang,et al.  Neuron-level Structured Pruning using Polarization Regularizer , 2020, NeurIPS.

[32]  Jiahui Yu,et al.  AutoSlim: Towards One-Shot Architecture Search for Channel Numbers , 2019 .

[33]  Ralf C. Staudemeyer,et al.  Understanding LSTM - a tutorial into Long Short-Term Memory Recurrent Neural Networks , 2019, ArXiv.

[34]  Ping Wang,et al.  Gate Decorator: Global Filter Pruning Method for Accelerating Deep Convolutional Neural Networks , 2019, NeurIPS.

[35]  Pavlo Molchanov,et al.  Importance Estimation for Neural Network Pruning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jiaxiang Wu,et al.  Collaborative Channel Pruning for Deep Networks , 2019, ICML.

[37]  Yi Yang,et al.  Network Pruning via Transformable Architecture Search , 2019, NeurIPS.

[38]  Sanja Fidler,et al.  EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis , 2019, ICML.

[39]  Jungong Han,et al.  Approximated Oracle Filter Pruning for Destructive CNN Width Optimization , 2019, ICML.

[40]  Yanzhi Wang,et al.  ResNet Can Be Pruned 60×: Introducing Network Purification and Unused Path Removal (P-RM) after Weight Pruning , 2019, 2019 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH).

[41]  Jungong Han,et al.  Centripetal SGD for Pruning Very Deep Convolutional Networks With Complicated Structure , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Xiangyu Zhang,et al.  MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Jiahui Yu,et al.  Slimmable neural networks for edge devices , 2019 .

[44]  Ping Liu,et al.  Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[46]  Yi Yang,et al.  Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks , 2018, IJCAI.

[47]  Sheng Tang,et al.  Auto-Balanced Filter Pruning for Efficient Convolutional Neural Networks , 2018, AAAI.

[48]  Bo Chen,et al.  NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications , 2018, ECCV.

[49]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[50]  James Zijun Wang,et al.  Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers , 2018, ICLR.

[51]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Larry S. Davis,et al.  NISP: Pruning Networks Using Neuron Importance Score Propagation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Naiyan Wang,et al.  Data-Driven Sparse Structure Selection for Deep Neural Networks , 2017, ECCV.

[54]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[55]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[56]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[57]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[58]  Xin Dong,et al.  Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon , 2017, NIPS.

[59]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[61]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[63]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[66]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[67]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[68]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[69]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[71]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Sébastien Marcel,et al.  Torchvision the machine-vision package of torch , 2010, ACM Multimedia.

[73]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[74]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[75]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[76]  Alfred V. Aho,et al.  The Transitive Reduction of a Directed Graph , 1972, SIAM J. Comput..