Exploration of Low Numeric Precision Deep Learning Inference Using Intel® FPGAs

Convolutional neural networks (CNN) have been shown to maintain reasonable classification accuracy when quantized to lower precisions, however, quantizing to sub 8-bit activations and weights can result in classification accuracy falling below an acceptable threshold. Techniques exist for closing the accuracy gap of limited numeric precision networks typically by means of increasing computation. This results in a trade-off between throughput and accuracy and can be tailored for different networks through various combinations of activation and weight data widths. Customizable hardware architectures like FPGAs provide the opportunity for data width specific computation through unique logic configurations leading to highly optimized processing that is unattainable by full precision networks. Specifically, ternary and binary weighted networks offer an efficient method of inference for 2-bit and 1-bit data respectively. Most hardware architectures can take advantage of the memory storage and bandwidth savings that come along with a smaller datapath, but very few architectures can take full advantage of limited numeric precision at the computation level. In this paper, we present a hardware design for FPGAs that takes advantage of the bandwidth, memory, power, and computation savings of limited numerical precision data. We provide insights into the trade-offs between throughput and accuracy for various networks and how they map to our framework. Further, we show how limited numeric precision computation can be efficiently mapped onto FPGAs for both ternary and binary cases. Starting with Arria 10, we show a 2-bit activation and ternary weighted AlexNet running in hardware that achieves 3,700 images per second on the ImageNet dataset with a top-1 accuracy of 0.49. Using a hardware modeler designed for our low numeric precision framework we project performance most notably for a 55.5 TOPS Stratix 10 device running a modified ResNet-34 with only 3.7% accuracy degradation compared with single precision.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[3]  Philipp Gysel,et al.  Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.

[4]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[5]  Eriko Nurvitadhi,et al.  WRPN: Wide Reduced-Precision Networks , 2017, ICLR.

[6]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[7]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[8]  Eriko Nurvitadhi,et al.  Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks? , 2017, FPGA.

[9]  Wonyong Sung,et al.  FPGA based implementation of deep neural networks using on-chip memory only , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Eriko Nurvitadhi,et al.  Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC , 2016, 2016 International Conference on Field-Programmable Technology (FPT).

[13]  Andrew C. Ling,et al.  An OpenCL™ Deep Learning Accelerator on Arria 10 , 2017, FPGA.

[14]  Luca Benini,et al.  YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[15]  Eriko Nurvitadhi,et al.  Accelerating Deep Convolutional Networks using low-precision and sparsity , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.