REAF: Remembering Enhancement and Entropy-Based Asymptotic Forgetting for Filter Pruning

Neurologically, filter pruning is a procedure of forgetting and remembering recovering. Prevailing methods directly forget less important information from an unrobust baseline at first and expect to minimize the performance sacrifice. However, unsaturated base remembering imposes a ceiling on the slimmed model leading to suboptimal performance. And significantly forgetting at first would cause unrecoverable information loss. Here, we design a novel filter pruning paradigm termed Remembering Enhancement and Entropy-based Asymptotic Forgetting (REAF). Inspired by robustness theory, we first enhance remembering by over-parameterizing baseline with fusible compensatory convolutions which liberates pruned model from the bondage of baseline at no inference cost. Then the collateral implication between original and compensatory filters necessitates a bilateral-collaborated pruning criterion. Specifically, only when the filter has the largest intra-branch distance and its compensatory counterpart has the strongest remembering enhancement power, they are preserved. Further, Ebbinghaus curve-based asymptotic forgetting is proposed to protect the pruned model from unstable learning. The number of pruned filters is increasing asymptotically in the training procedure, which enables the remembering of pretrained weights gradually to be concentrated in the remaining filters. Extensive experiments demonstrate the superiority of REAF over many state-of-the-art (SOTA) methods. For example, REAF removes 47.55% FLOPs and 42.98% parameters of ResNet-50 only with 0.98% TOP-1 accuracy loss on ImageNet. The code is available at https://github.com/zhangxin-xd/REAF.

[1]  Bob Zhang,et al.  Deep Bilateral Filtering Network for Point-Supervised Semantic Segmentation in Remote Sensing Images , 2022, IEEE Transactions on Image Processing.

[2]  Rongrong Ji,et al.  Learning Best Combination for Efficient N: M Sparsity , 2022, NeurIPS.

[3]  Jiyong Zhang,et al.  Age-Invariant Face Recognition by Multi-Feature Fusionand Decomposition with Self-attention , 2022, ACM Trans. Multim. Comput. Commun. Appl..

[4]  Nanfei Jiang,et al.  Pruning-aware Sparse Regularization for Network Pruning , 2022, Machine Intelligence Research.

[5]  Weiying Xie,et al.  Filter Pruning via Learned Representation Median in the Frequency Domain , 2021, IEEE Transactions on Cybernetics.

[6]  Yonghong Tian,et al.  IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  T. Teng,et al.  Precise No-Reference Image Quality Evaluation Based on Distortion Identification , 2021, ACM Trans. Multim. Comput. Commun. Appl..

[8]  Kuo-Chin Fan,et al.  CSL-YOLO: A New Lightweight Object Detection System for Edge Computing , 2021, ArXiv.

[9]  Weidong Cai,et al.  Network Pruning via Performance Maximization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Sébastien Bubeck,et al.  A Universal Law of Robustness via Isoperimetry , 2021, NeurIPS.

[11]  Binh-Son Hua,et al.  Network Pruning That Matters: A Case Study on Retraining Variants , 2021, ICLR.

[12]  Yonghong Tian,et al.  Carrying out CNN Channel Pruning in a White Box , 2021, IEEE transactions on neural networks and learning systems.

[13]  Zi Wang,et al.  Convolutional Neural Network Pruning with Structural Redundancy Reduction , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Aijun Yang,et al.  Complementary Relation Contrastive Distillation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Yonghong Tian,et al.  Distilling a Powerful Student Model via Online Knowledge Distillation , 2021, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Guiguang Ding,et al.  Diverse Branch Block: Building a Convolution as an Inception-like Unit , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Zhendong Mao,et al.  Task-Adaptive Attention for Image Captioning , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  Kaisheng Ma,et al.  Self-Distillation: Towards Efficient and Compact Neural Networks , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Weisi Guo,et al.  Random sketch learning for deep neural networks in edge computing , 2021, Nature Computational Science.

[20]  Zhijie Zhang,et al.  Learning N: M Fine-grained Structured Sparse Neural Networks From Scratch , 2021, ICLR.

[21]  Yan Wang,et al.  Network Pruning Using Adaptive Exemplar Filters , 2021, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Guiguang Ding,et al.  Where to Prune: Using LSTM to Guide Data-Dependent Soft Pruning , 2020, IEEE Transactions on Image Processing.

[23]  Yongdong Zhang,et al.  Depth Image Denoising Using Nuclear Norm and Learning Graph Model , 2020, ACM Trans. Multim. Comput. Commun. Appl..

[24]  Bohyung Han,et al.  Operation-Aware Soft Channel Pruning using Differentiable Masks , 2020, ICML.

[25]  Ji Liu,et al.  ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Longhui Wei,et al.  GOLD-NAS: Gradual, One-Level, Differentiable , 2020, ArXiv.

[27]  Hanwang Zhang,et al.  Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Song Han,et al.  APQ: Joint Search for Network Architecture, Pruning and Quantization Policy , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Yu Wang,et al.  DSA: More Efficient Budgeted Pruning via Differentiable Sparsity Allocation , 2020, ECCV.

[30]  Luc Van Gool,et al.  DHP: Differentiable Meta Pruning via HyperNetworks , 2020, ECCV.

[31]  Luc Van Gool,et al.  Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Rongrong Ji,et al.  HRank: Filter Pruning Using High-Rank Feature Map , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yue Gao,et al.  Deep Multi-View Enhancement Hashing for Image Retrieval , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Rongrong Ji,et al.  Filter Sketch for Network Pruning , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Yi Yang,et al.  Network Pruning via Transformable Architecture Search , 2019, NeurIPS.

[36]  Jon Atli Benediktsson,et al.  Deep Learning for Hyperspectral Image Classification: An Overview , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[37]  Gang Yu,et al.  ThunderNet: Towards Real-Time Generic Object Detection on Mobile Devices , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Xiangyu Zhang,et al.  MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Liujuan Cao,et al.  Towards Optimal Structured CNN Pruning via Generative Adversarial Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Chong-Min Kyung,et al.  Efficient Neural Network Compression , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Ping Liu,et al.  Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Yi Yang,et al.  Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks , 2018, IJCAI.

[43]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[44]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Training Pruned Neural Networks , 2018, ArXiv.

[45]  Jascha Sohl-Dickstein,et al.  Sensitivity and Generalization in Neural Networks: an Empirical Study , 2018, ICLR.

[46]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[47]  James Zijun Wang,et al.  Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers , 2018, ICLR.

[48]  Larry S. Davis,et al.  NISP: Pruning Networks Using Neuron Importance Score Propagation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Suyog Gupta,et al.  To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.

[50]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[51]  Erich Elsen,et al.  Exploring Sparsity in Recurrent Neural Networks , 2017, ICLR.

[52]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[53]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[54]  Junwei Han,et al.  A Survey on Object Detection in Optical Remote Sensing Images , 2016, ArXiv.

[55]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[57]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[58]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[59]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[60]  E. Todeva Networks , 2007 .

[61]  Leyuan Fang,et al.  Hyperspectral Image Instance Segmentation Using Spectral–Spatial Feature Pyramid Network , 2023, IEEE Transactions on Geoscience and Remote Sensing.

[62]  Lei Deng,et al.  TETRIS: TilE-matching the TRemendous Irregular Sparsity , 2018, NeurIPS.

[63]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.