Customizing Low-Precision Deep Neural Networks for FPGAs

In this paper, we argue that instead of solely focusing on developing efficient architectures to accelerate well-known low-precision CNNs, we should also seek to modify the network to suit the FPGA. We develop a fully automative toolflow which focuses on modifying the network through filter pruning, such that it efficiently utilizes the FPGA hardware whilst satisfying a predefined accuracy threshold. Although fewer weights are re-moved in comparison to traditional pruning techniques designed for software implementations, the overall model complexity and feature map storage is greatly reduced. We implement the AlexNet and TinyYolo networks on the large-scale ImageNet and PascalVOC datasets, to demonstrate up to roughly 2× speedup in frames per second and 2× reduction in resource requirements over the original network, with equal or improved accuracy.

[1]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Eriko Nurvitadhi,et al.  High performance binary neural networks on the Xeon+FPGA™ platform , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[3]  Eriko Nurvitadhi,et al.  Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC , 2016, 2016 International Conference on Field-Programmable Technology (FPT).

[4]  Xuegong Zhou,et al.  A high performance FPGA-based accelerator for large-scale convolutional neural networks , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[5]  Andrew C. Ling,et al.  An OpenCL(TM) Deep Learning Accelerator on Arria 10 , 2017 .

[6]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[7]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[8]  Andrew C. Ling,et al.  An OpenCL™ Deep Learning Accelerator on Arria 10 , 2017, FPGA.