论文信息 - Scalable Model Compression by Entropy Penalized Reparameterization

Scalable Model Compression by Entropy Penalized Reparameterization

We describe a simple and general neural network weight compression approach, in which the network parameters (weights and biases) are represented in a "latent" space, amounting to a reparameterization. This space is equipped with a learned probability model, which is used to impose an entropy penalty on the parameter representation during training, and to compress the representation using a simple arithmetic coder after training. Classification accuracy and model compressibility is maximized jointly, with the bitrate--accuracy trade-off specified by a hyperparameter. We evaluate the method on the MNIST, CIFAR-10 and ImageNet classification benchmarks using six distinct model architectures. Our results show that state-of-the-art model compression can be achieved in a scalable and general way without requiring complex procedures such as multi-stage training.

Johannes Ballé | Saurabh Singh | Abhinav Shrivastava | Deniz Oktay

[1] Lucas Theis,et al. Lossy Image Compression with Compressive Autoencoders , 2017, ICLR.

[2] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[3] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[4] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[5] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[6] Valero Laparra,et al. End-to-end Optimized Image Compression , 2016, ICLR.

[7] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] David Minnen,et al. Variational image compression with a scale hyperprior , 2018, ICLR.

[9] Glen G. Langdon,et al. Universal modeling and coding , 1981, IEEE Trans. Inf. Theory.

[10] R. Venkatesh Babu,et al. Data-free Parameter Pruning for Deep Neural Networks , 2015, BMVC.

[11] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[12] Yurong Chen,et al. Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14] Dacheng Tao,et al. Packing Convolutional Neural Networks in the Frequency Domain , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] C. E. SHANNON,et al. A mathematical theory of communication , 1948, MOCO.

[16] Klaus-Robert Müller,et al. Entropy-Constrained Training of Deep Neural Networks , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[17] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.

[18] Yixin Chen,et al. Compressing Convolutional Neural Networks in the Frequency Domain , 2015, KDD.

[19] Yixin Chen,et al. Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[20] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[21] Avi Mendelson,et al. UNIQ: Uniform Noise Injection for the Quantization of Neural Networks , 2018, ArXiv.

[22] Narendra Ahuja,et al. Coreset-Based Neural Network Compression , 2018, ECCV.

[23] Max Welling,et al. Soft Weight-Sharing for Neural Network Compression , 2017, ICLR.

[24] Heiko Schwarz,et al. DeepCABAC: Context-adaptive binary arithmetic coding for deep neural network compression , 2019, ArXiv.

[25] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[26] Hanan Samet,et al. Pruning Filters for Efficient ConvNets , 2016, ICLR.

[27] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.