Optimizing Information Theory Based Bitwise Bottlenecks for Efficient Mixed-Precision Activation Quantization

Recent researches on information theory shed new light on the continuous attempts to open the black box of neural signal encoding. Inspired by the problem of lossy signal compression for wireless communication, this paper presents a Bitwise Bottleneck approach for quantizing and encoding neural network activations. Based on the rate-distortion theory, the Bitwise Bottleneck attempts to determine the most significant bits in activation representation by assigning and approximating the sparse coefficients associated with different bits. Given the constraint of a limited average code rate, the bottleneck minimizes the distortion for optimal activation quantization in a flexible layer-by-layer manner. Experiments over ImageNet and other datasets show that, by minimizing the quantization distortion of each layer, the neural network with bottlenecks achieves the state-of-the-art accuracy with low-precision activation. Meanwhile, by reducing the code rate, the proposed method can improve the memory and computational efficiency by over six times compared with the deep neural network with standard single-precision representation. The source code is available on GitHub: https: //github.com/CQUlearningsystemgroup/BitwiseBottleneck.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Tristan Perez,et al.  Mixtures of Lightweight Deep Convolutional Neural Networks: Applied to Agricultural Robotics , 2017, IEEE Robotics and Automation Letters.

[3]  T. Cover,et al.  Rate Distortion Theory , 2001 .

[4]  Fengbo Ren,et al.  Build a Compact Binary Neural Network through Bit-level Sensitivity and Data Pruning , 2018, Neurocomputing.

[5]  Yoni Choukroun,et al.  Low-bit Quantization of Neural Networks for Efficient Inference , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[6]  Avi Mendelson,et al.  UNIQ: Uniform Noise Injection for the Quantization of Neural Networks , 2018, ArXiv.

[7]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[8]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[9]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Lin Xu,et al.  Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.

[12]  Xianglong Liu,et al.  Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[14]  Zhiru Zhang,et al.  Improving Neural Network Quantization using Outlier Channel Splitting , 2019 .

[15]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[17]  Massimo Fornasier,et al.  Compressive Sensing , 2015, Handbook of Mathematical Methods in Imaging.

[18]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[19]  Seungwon Lee,et al.  Quantization for Rapid Deployment of Deep Neural Networks , 2018, ArXiv.

[20]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[21]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[22]  Paris Smaragdis,et al.  Bitwise Neural Networks , 2016, ArXiv.

[23]  Rongrong Ji,et al.  Exploiting Kernel Sparsity and Entropy for Interpretable CNN Compression , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Daniel Soudry,et al.  Post training 4-bit quantization of convolutional networks for rapid-deployment , 2018, NeurIPS.

[25]  Lei Zhang,et al.  DANoC: An Efficient Algorithm and Hardware Codesign of Deep Neural Networks on Chip , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[27]  Yoshua Bengio,et al.  Training deep neural networks with low precision multiplications , 2014 .

[28]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[31]  David P. Wipf,et al.  Compressing Neural Networks using the Variational Information Bottleneck , 2018, ICML.

[32]  Forrest N. Iandola,et al.  SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33]  Yang Gao,et al.  End-to-End Learning of Driving Models from Large-Scale Video Datasets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35]  Tao Mei,et al.  Deep Quantization: Encoding Convolutional Activations with Deep Generative Model , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).