Towards Efficient Model Compression via Learned Global Ranking

Pruning convolutional filters has demonstrated its effectiveness in compressing ConvNets. Prior art in filter pruning requires users to specify a target model complexity (e.g., model size or FLOP count) for the resulting architecture. However, determining a target model complexity can be difficult for optimizing various embodied AI applications such as autonomous robots, drones, and user-facing applications. First, both the accuracy and the speed of ConvNets can affect the performance of the application. Second, the performance of the application can be hard to assess without evaluating ConvNets during inference. As a consequence, finding a sweet-spot between the accuracy and speed via filter pruning, which needs to be done in a trial-and-error fashion, can be time-consuming. This work takes a first step toward making this process more efficient by altering the goal of model compression to producing a set of ConvNets with various accuracy and latency trade-offs instead of producing one ConvNet targeting some pre-defined latency constraint. To this end, we propose to learn a global ranking of the filters across different layers of the ConvNet, which is used to obtain a set of ConvNet architectures that have different accuracy/latency trade-offs by pruning the bottom-ranked filters. Our proposed algorithm, LeGR, is shown to be 2× to 3× faster than prior work while having comparable or better performance when targeting seven pruned ResNet-56 with different accuracy/FLOPs profiles on the CIFAR-100 dataset. Additionally, we have evaluated LeGR on ImageNet and Bird-200 with ResNet-50 and Mo- bileNetV2 to demonstrate its effectiveness. Code available at https://github.com/cmu-enyac/LeGR.

[1]  Tara N. Sainath,et al.  State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Song Han,et al.  Exploring the Regularity of Sparse Structure in Convolutional Neural Networks , 2017, ArXiv.

[4]  Haoyi Xiong,et al.  DELTA: DEep Learning Transfer using Feature Map with Attention for Convolutional Networks , 2019, ICLR.

[5]  Bingbing Ni,et al.  Variational Convolutional Neural Network Pruning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Larry S. Davis,et al.  NISP: Pruning Networks Using Neuron Importance Score Propagation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Atsuto Maki,et al.  Target Aware Network Adaptation for Efficient Representation Learning , 2018, ECCV Workshops.

[8]  Pavlo Molchanov,et al.  Importance Estimation for Neural Network Pruning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  David P. Wipf,et al.  Compressing Neural Networks using the Variational Information Bottleneck , 2018, ICML.

[10]  Ji Liu,et al.  ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Jungong Han,et al.  Approximated Oracle Filter Pruning for Destructive CNN Width Optimization , 2019, ICML.

[13]  Elad Eban,et al.  MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[15]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[16]  Quoc V. Le,et al.  SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.

[17]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[18]  Liujuan Cao,et al.  Towards Optimal Structured CNN Pruning via Generative Adversarial Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Qi Tian,et al.  Accelerate CNN via Recursive Bayesian Pruning , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Jing Liu,et al.  Discrimination-aware Channel Pruning for Deep Neural Networks , 2018, NeurIPS.

[21]  Jiaxiang Wu,et al.  Collaborative Channel Pruning for Deep Networks , 2019, ICML.

[22]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[23]  Sercan Ömer Arik,et al.  Resource-Efficient Neural Architect , 2018, ArXiv.

[24]  Kilian Q. Weinberger,et al.  CondenseNet: An Efficient DenseNet Using Learned Group Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Kurt Keutzer,et al.  Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Xiangyu Zhang,et al.  MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Max Welling,et al.  Bayesian Compression for Deep Learning , 2017, NIPS.

[31]  Huan Wang,et al.  Structured Probabilistic Pruning for Convolutional Neural Network Acceleration , 2017, BMVC.

[32]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[33]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[34]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[35]  Swagath Venkataramani,et al.  Bridging the Accuracy Gap for 2-bit Quantized Neural Networks (QNN) , 2018, ArXiv.

[36]  Yi Yang,et al.  Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks , 2018, IJCAI.

[37]  Jae-Joon Han,et al.  Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jungong Han,et al.  Centripetal SGD for Pruning Very Deep Convolutional Networks With Complicated Structure , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Ping Liu,et al.  Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  James T. Kwok,et al.  Loss-aware Weight Quantization of Deep Networks , 2018, ICLR.

[41]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[43]  Diana Marculescu,et al.  AdaScale: Towards Real-time Video Object Detection Using Adaptive Scaling , 2019, MLSys.

[44]  Max Welling,et al.  Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[45]  Svetlana Lazebnik,et al.  PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[47]  Wei Wei,et al.  2019 Formatting Instructions for Authors Using LaTeX , 2018 .

[48]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[49]  Niraj K. Jha,et al.  ChamNet: Towards Efficient Network Design Through Platform-Aware Model Adaptation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Min Sun,et al.  DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures , 2018, ECCV.

[51]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[52]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  James Zijun Wang,et al.  Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers , 2018, ICLR.

[54]  Diana Marculescu,et al.  Regularizing Activation Distribution for Training Binarized Deep Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Lucas Theis,et al.  Faster gaze prediction with dense networks and Fisher pruning , 2018, ArXiv.

[56]  Bo Chen,et al.  NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications , 2018, ECCV.

[57]  Jie Liu,et al.  Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 Hours , 2019, ECML/PKDD.

[58]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[59]  Xin Yuan,et al.  Enhanced Bayesian Compression via Deep Reinforcement Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[61]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[62]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[63]  Diana Marculescu,et al.  Designing Adaptive Neural Networks for Energy-Constrained Image Classification , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[64]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[65]  Wenzhi Cui,et al.  MAVBench: Micro Aerial Vehicle Benchmarking , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[66]  Naiyan Wang,et al.  Data-Driven Sparse Structure Selection for Deep Neural Networks , 2017, ECCV.

[67]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[68]  Song Han,et al.  Trained Ternary Quantization , 2016, ICLR.

[69]  Yuchun Ma,et al.  AddressNet: Shift-Based Primitives for Efficient Convolutional Neural Networks , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[70]  Sanja Fidler,et al.  MovieQA: Understanding Stories in Movies through Question-Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[72]  Rongrong Ji,et al.  Accelerating Convolutional Networks via Global & Dynamic Filter Pruning , 2018, IJCAI.

[73]  Diana Marculescu,et al.  FLightNNs: Lightweight Quantized Deep Neural Networks for Fast and Accurate Inference , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[74]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[75]  Zhiru Zhang,et al.  Building Efficient Deep Neural Networks With Unitary Group Convolutions , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).