论文信息 - Bayesian Automatic Model Compression

Bayesian Automatic Model Compression

Model compression has drawn great attention in deep learning community. A core problem in model compression is to determine the layer-wise optimal compression policy, e.g., the layer-wise bit-width in network quantization. Conventional hand-crafted heuristics rely on human experts and are usually sub-optimal, while recent reinforcement learning based approaches can be inefficient during the exploration of the policy space. In this article, we propose Bayesian automatic model compression (BAMC), which leverages non-parametric Bayesian methods to learn the optimal quantization bit-width for each layer of the network. BAMC is trained in a one-shot manner, avoiding the back and forth (re)-training in reinforcement learning based approaches. Experimental results on various datasets validate that our proposed methods can find reasonable quantization policies efficiently with little accuracy drop for the quantized network.

[1] Antti Honkela,et al. Variational learning and bits-back coding: an information-theoretic view to Bayesian learning , 2004, IEEE Transactions on Neural Networks.

[2] Song Han,et al. AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[3] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[4] Xin Yuan,et al. Enhanced Bayesian Compression via Deep Reinforcement Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Jian Cheng,et al. Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Wei Liu,et al. PocketFlow: An Automated Framework for Compressing and Accelerating Deep Neural Networks , 2018 .

[8] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[9] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[11] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12] J. Sethuraman. A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[13] Jian Sun,et al. Accelerating Very Deep Convolutional Networks for Classification and Detection , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Ivan V. Oseledets,et al. Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.

[15] Joost van de Weijer,et al. Domain-Adaptive Deep Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16] Andrew Zisserman,et al. Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[17] Mingjie Sun,et al. Rethinking the Value of Network Pruning , 2018, ICLR.

[18] Max Welling,et al. Improved Bayesian Compression , 2017, 1711.06494.

[19] Alex Graves,et al. Practical Variational Inference for Neural Networks , 2011, NIPS.

[20] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[21] Zhijian Liu,et al. HAQ: Hardware-Aware Automated Quantization With Mixed Precision , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Jianxin Wu,et al. ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23] Song Han,et al. Trained Ternary Quantization , 2016, ICLR.

[24] J. Rissanen. Stochastic Complexity and Modeling , 1986 .

[25] Xiangyu Zhang,et al. Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[28] P. Grünwald. The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[29] Michael I. Jordan,et al. Variational inference for Dirichlet process mixtures , 2006 .

[30] Rongrong Ji,et al. Circulant Binary Convolutional Networks: Enhancing the Performance of 1-Bit DCNNs With Circulant Back Propagation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Guodong Guo,et al. Rectified Binary Convolutional Networks for Enhancing the Performance of 1-bit DCNNs , 2019, IJCAI.

[32] Yi Yang,et al. More is Less: A More Complicated Network with Less Inference Complexity , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Dmitry P. Vetrov,et al. Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[34] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[35] J. Pitman. Combinatorial Stochastic Processes , 2006 .

[36] Rongrong Ji,et al. Bayesian Optimized 1-Bit CNNs , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37] Max Welling,et al. Bayesian Compression for Deep Learning , 2017, NIPS.

[38] Max Welling,et al. Soft Weight-Sharing for Neural Network Compression , 2017, ICLR.

[39] Radford M. Neal. Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[40] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.