Efficient non-uniform quantizer for quantized neural network targeting reconfigurable hardware

Convolutional Neural Networks (CNN) has become more popular choice for various tasks such as computer vision, speech recognition and natural language processing. Thanks to their large computational capability and throughput, GPUs ,which are not power efficient and therefore does not suit low power systems such as mobile devices, are the most common platform for both training and inferencing tasks. Recent studies has shown that FPGAs can provide a good alternative to GPUs as a CNN accelerator, due to their re-configurable nature, low power and small latency. In order for FPGA-based accelerators outperform GPUs in inference task, both the parameters of the network and the activations must be quantized. While most works use uniform quantizers for both parameters and activations, it is not always the optimal one, and a non-uniform quantizer need to be considered. In this work we introduce a custom hardware-friendly approach to implement non-uniform quantizers. In addition, we use a single scale integer representation of both parameters and activations, for both training and inference. The combined method yields a hardware efficient non-uniform quantizer, fit for real-time applications. We have tested our method on CIFAR-10 and CIFAR-100 image classification datasets with ResNet-18 and VGG-like architectures, and saw little degradation in accuracy.

[1]  Alexander G. Anderson,et al.  The High-Dimensional Geometry of Binary Neural Networks , 2017, ICLR.

[2]  Dan Alistarh,et al.  Model compression via distillation and quantization , 2018, ICLR.

[3]  Yuhui Xu,et al.  Deep Neural Network Compression with Single and Multiple Level Quantization , 2018, AAAI.

[4]  Yoshua Bengio,et al.  BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 , 2016, ArXiv.

[5]  Avi Mendelson,et al.  UNIQ: Uniform Noise Injection for the Quantization of Neural Networks , 2018, ArXiv.

[6]  Saeed Sharifian,et al.  FPGA-based convolutional neural network accelerator design using high level synthesize , 2016, 2016 2nd International Conference of Signal Processing and Intelligent Systems (ICSPIS).

[7]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[8]  Fei Qiao,et al.  Optimizing convolutional neural network on FPGA under heterogeneous computing framework with OpenCL , 2016, 2016 IEEE Region 10 Conference (TENCON).

[9]  Avi Mendelson,et al.  Streaming Architecture for Large-Scale Quantized Neural Networks on an FPGA-Based Dataflow Platform , 2017, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[10]  Hideharu Amano,et al.  FPGA-based accelerator for losslessly quantized convolutional neural networks , 2017, 2017 International Conference on Field Programmable Technology (ICFPT).

[11]  Andrew C. Ling,et al.  An OpenCL(TM) Deep Learning Accelerator on Arria 10 , 2017 .

[12]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[13]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[14]  Asit K. Mishra,et al.  Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy , 2017, ICLR.

[15]  Andrew C. Ling,et al.  An OpenCL™ Deep Learning Accelerator on Arria 10 , 2017, FPGA.

[16]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Igor Carron,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .

[18]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[19]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[20]  Song Han,et al.  Trained Ternary Quantization , 2016, ICLR.

[21]  Rajesh Gupta,et al.  Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs , 2017, FPGA.

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[24]  Dong Wang,et al.  PipeCNN: An OpenCL-Based FPGA Accelerator for Large-Scale Convolution Neuron Networks , 2016, ArXiv.

[25]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[26]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[27]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.