Compact CNN Structure Learning by Knowledge Distillation

The concept of compressing deep Convolutional Neural Networks (CNNs) is essential to use limited computation, power, and memory resources on embedded devices. However, existing methods achieve this objective at the cost of a drop in inference accuracy in computer vision tasks. To address such a drawback, we propose a framework that leverages knowledge distillation along with customizable block-wise optimization to learn a lightweight CNN structure while preserving better control over the compression-performance tradeoff. Considering specific resource constraints, e.g., floating-point operations per inference (FLOPs) or model-parameters, our method results in a state of the art network compression while being capable of achieving better inference accuracy. In a comprehensive evaluation, we demonstrate that our method is effective, robust, and consistent with results over a variety of network architectures and datasets, at negligible training overhead. In particular, for the already compact network MobileNet_v2, our method offers up to 2× and 5.2× better model compression in terms of FLOPs and model-parameters, respectively, while getting 1.05% better model performance than the baseline network.

[1]  Bo Chen,et al.  NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications , 2018, ECCV.

[2]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[4]  Yu Cheng,et al.  Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.

[5]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[6]  Bernhard Schölkopf,et al.  Unifying distillation and privileged information , 2015, ICLR.

[7]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[8]  Philip S. Yu,et al.  Private Model Compression via Knowledge Distillation , 2018, AAAI.

[9]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Erich Elsen,et al.  Efficient Neural Audio Synthesis , 2018, ICML.

[11]  Linda G. Shapiro,et al.  ESPNetv2: A Light-Weight, Power Efficient, and General Purpose Convolutional Neural Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Pavlo Molchanov,et al.  Importance Estimation for Neural Network Pruning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Tariq Mahmood,et al.  Is the performance of a cricket team really unpredictable? A case study on Pakistan team using machine learning , 2020 .

[14]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[16]  Christoph H. Lampert,et al.  Learning to Rank Using Privileged Information , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Fabio Galasso,et al.  Adversarial Network Compression , 2018, ECCV Workshops.

[18]  Dan Alistarh,et al.  Model compression via distillation and quantization , 2018, ICLR.

[19]  Rauf Izmailov,et al.  Learning using privileged information: similarity control and knowledge transfer , 2015, J. Mach. Learn. Res..

[20]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[21]  Miguel Á. Carreira-Perpiñán,et al.  "Learning-Compression" Algorithms for Neural Net Pruning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jian Sun,et al.  Convolutional neural networks at constrained time cost , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Elad Eban,et al.  MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Feng Yan,et al.  AutoGrow: Automatic Layer Growing in Deep Convolutional Networks , 2019, KDD.