FQ-Conv: Fully Quantized Convolution for Efficient and Accurate Inference

Deep neural networks (DNNs) can be made hardware-efficient by reducing the numerical precision of the weights and activations of the network and by improving the network's resilience to noise. However, this gain in efficiency often comes at the cost of significantly reduced accuracy. In this paper, we present a novel approach to quantizing convolutional neural network. The resulting networks perform all computations in low-precision, without requiring higher-precision BN and nonlinearities, while still being highly accurate. To achieve this result, we employ a novel quantization technique that learns to optimally quantize the weights and activations of the network during training. Additionally, to enhance training convergence we use a new training technique, called gradual quantization. We leverage the nonlinear and normalizing behavior of our quantization function to effectively remove the higher-precision nonlinearities and BN from the network. The resulting convolutional layers are fully quantized to low precision, from input to output, ideal for neural network accelerators on the edge. We demonstrate the potential of this approach on different datasets and networks, showing that ternary-weight CNNs with low-precision in- and outputs perform virtually on par with their full-precision equivalents. Finally, we analyze the influence of noise on the weights, activations and convolution outputs (multiply-accumulate, MAC) and propose a strategy to improve network performance under noisy conditions.

[1]  Pritish Narayanan,et al.  Equivalent-accuracy accelerated neural-network training using analogue memory , 2018, Nature.

[2]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[3]  Farnood Merrikh-Bayat,et al.  Temperature-insensitive analog vector-by-matrix multiplier based on 55 nm NOR flash memory cells , 2016, 2017 IEEE Custom Integrated Circuits Conference (CICC).

[4]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[5]  Swagath Venkataramani,et al.  Accurate and Efficient 2-bit Quantized Neural Networks , 2019, MLSys.

[6]  Dharmendra S. Modha,et al.  Deep neural networks are robust to weight binarization and other non-linear distortions , 2016, ArXiv.

[7]  Yifan Gong,et al.  Restructuring of deep neural network acoustic models with singular value decomposition , 2013, INTERSPEECH.

[8]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Tim Verbelen,et al.  Training binary neural networks with knowledge transfer , 2020, Neurocomputing.

[10]  Xundong Wu High Performance Binarized Neural Networks trained on the ImageNet Classification Task , 2016, ArXiv.

[11]  Ali Farhadi,et al.  Label Refinery: Improving ImageNet Classification through Label Progression , 2018, ArXiv.

[12]  Richard F. Lyon,et al.  Neural Networks for Machine Learning , 2017 .

[13]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[14]  Bin Liu,et al.  Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Tara N. Sainath,et al.  Convolutional neural networks for small-footprint keyword spotting , 2015, INTERSPEECH.

[16]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[17]  Igor Carron,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .

[18]  Yuhui Xu,et al.  Deep Neural Network Compression with Single and Multiple Level Quantization , 2018, AAAI.

[19]  Marian Verhelst,et al.  14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[20]  D. Stojanoff,et al.  Convergence of iterated Aluthge transform sequence for diagonalizable matrices II: $\lambda$-Aluthge transform , 2007, 0706.1234.

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[23]  Song Han,et al.  Trained Ternary Quantization , 2016, ICLR.

[24]  Yong Zhao,et al.  Binarized Neural Networks on the ImageNet Classification Task , 2016 .

[25]  G. Hua,et al.  LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks , 2018, ECCV.

[26]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[27]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[28]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[29]  Jimmy J. Lin,et al.  Deep Residual Learning for Small-Footprint Keyword Spotting , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[31]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[32]  Avi Mendelson,et al.  NICE: Noise Injection and Clamping Estimation for Neural Network Quantization , 2018, Mathematics.

[33]  Dan Alistarh,et al.  Model compression via distillation and quantization , 2018, ICLR.

[34]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[35]  Parul Parashar,et al.  Neural Networks in Machine Learning , 2014 .

[36]  Alexander G. Anderson,et al.  The High-Dimensional Geometry of Binary Neural Networks , 2017, ICLR.

[37]  Jae-Joon Han,et al.  Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jian Sun,et al.  Deep Learning with Low Precision by Half-Wave Gaussian Quantization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Yundong Zhang,et al.  Hello Edge: Keyword Spotting on Microcontrollers , 2017, ArXiv.