Bayesian Automatic Model Compression

Model compression has drawn great attention in deep learning community. A core problem in model compression is to determine the layer-wise optimal compression policy, e.g., the layer-wise bit-width in network quantization. Conventional hand-crafted heuristics rely on human experts and are usually sub-optimal, while recent reinforcement learning based approaches can be inefficient during the exploration of the policy space. In this article, we propose Bayesian automatic model compression (BAMC), which leverages non-parametric Bayesian methods to learn the optimal quantization bit-width for each layer of the network. BAMC is trained in a one-shot manner, avoiding the back and forth (re)-training in reinforcement learning based approaches. Experimental results on various datasets validate that our proposed methods can find reasonable quantization policies efficiently with little accuracy drop for the quantized network.

[1]  Antti Honkela,et al.  Variational learning and bits-back coding: an information-theoretic view to Bayesian learning , 2004, IEEE Transactions on Neural Networks.

[2]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[3]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[4]  Xin Yuan,et al.  Enhanced Bayesian Compression via Deep Reinforcement Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Wei Liu,et al.  PocketFlow: An Automated Framework for Compressing and Accelerating Deep Neural Networks , 2018 .

[8]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[13]  Jian Sun,et al.  Accelerating Very Deep Convolutional Networks for Classification and Detection , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Ivan V. Oseledets,et al.  Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.

[15]  Joost van de Weijer,et al.  Domain-Adaptive Deep Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[17]  Mingjie Sun,et al.  Rethinking the Value of Network Pruning , 2018, ICLR.

[18]  Max Welling,et al.  Improved Bayesian Compression , 2017, 1711.06494.

[19]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[20]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[21]  Zhijian Liu,et al.  HAQ: Hardware-Aware Automated Quantization With Mixed Precision , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Song Han,et al.  Trained Ternary Quantization , 2016, ICLR.

[24]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[25]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[28]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[29]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[30]  Rongrong Ji,et al.  Circulant Binary Convolutional Networks: Enhancing the Performance of 1-Bit DCNNs With Circulant Back Propagation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Guodong Guo,et al.  Rectified Binary Convolutional Networks for Enhancing the Performance of 1-bit DCNNs , 2019, IJCAI.

[32]  Yi Yang,et al.  More is Less: A More Complicated Network with Less Inference Complexity , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[34]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[35]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[36]  Rongrong Ji,et al.  Bayesian Optimized 1-Bit CNNs , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Max Welling,et al.  Bayesian Compression for Deep Learning , 2017, NIPS.

[38]  Max Welling,et al.  Soft Weight-Sharing for Neural Network Compression , 2017, ICLR.

[39]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[40]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.