UNIQ

We present a novel method for neural network quantization that emulates a non-uniform $k$-quantile quantizer, which adapts to the distribution of the quantized parameters. Our approach provides a novel alternative to the existing uniform quantization techniques for neural networks. We suggest to compare the results as a function of the bit-operations (BOPS) performed, assuming a look-up table availability for the non-uniform case. In this setup, we show the advantages of our strategy in the low computational budget regime. While the proposed solution is harder to implement in hardware, we believe it sets a basis for new alternatives to neural networks quantization.

[1]  Eunhyeok Park,et al.  Value-aware Quantization for Training and Inference of Neural Networks , 2018, ECCV.

[2]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[3]  Boris Murmann,et al.  A Pixel Pitch-Matched Ultrasound Receiver for 3-D Photoacoustic Imaging With Integrated Delta-Sigma Beamformer in 28-nm UTBB FD-SOI , 2017, IEEE Journal of Solid-State Circuits.

[4]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[5]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[6]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[7]  David L. Neuhoff,et al.  Quantization , 2022, IEEE Trans. Inf. Theory.

[8]  Max Welling,et al.  Soft Weight-Sharing for Neural Network Compression , 2017, ICLR.

[9]  Hang Su,et al.  Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization , 2017, BMVC.

[10]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Lin Xu,et al.  Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.

[13]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[14]  Daisuke Miyashita,et al.  LogNet: Energy-efficient neural networks using logarithmic computation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Song Han,et al.  Trained Ternary Quantization , 2016, ICLR.

[16]  Hadi Esmaeilzadeh,et al.  Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network , 2017, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[17]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[18]  Chen Feng,et al.  A Quantization-Friendly Separable Convolution for MobileNets , 2018, 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2).

[19]  Avi Mendelson,et al.  NICE: Noise Injection and Clamping Estimation for Neural Network Quantization , 2018, Mathematics.

[20]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[21]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[22]  Dan Alistarh,et al.  Model compression via distillation and quantization , 2018, ICLR.

[23]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[24]  Patrick Judd,et al.  Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[25]  Yuhui Xu,et al.  Deep Neural Network Compression with Single and Multiple Level Quantization , 2018, AAAI.

[26]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[28]  Max Welling,et al.  Bayesian Compression for Deep Learning , 2017, NIPS.

[29]  Eriko Nurvitadhi,et al.  WRPN: Wide Reduced-Precision Networks , 2017, ICLR.

[30]  Alexander G. Anderson,et al.  The High-Dimensional Geometry of Binary Neural Networks , 2017, ICLR.

[31]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[32]  Asit K. Mishra,et al.  Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy , 2017, ICLR.

[33]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[34]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[35]  BengioYoshua,et al.  Quantized neural networks , 2017 .

[36]  Vivienne Sze,et al.  Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks , 2017, IEEE Journal of Solid-State Circuits.

[37]  Shuchang Zhou,et al.  Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks , 2017, Journal of Computer Science and Technology.

[38]  Jian Sun,et al.  Deep Learning with Low Precision by Half-Wave Gaussian Quantization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.