Hybrid Dot-Product Calculation for Convolutional Neural Networks in FPGA

Convolutional Neural Networks (CNN) are quite useful in edge devices for security, surveillance, and many others. Running CNNs in embedded devices is a design challenge since these models require high computing power and large memory storage. Data quantization is an optimization technique applied to CNN to reduce the computing and memory requirements. The method reduces the number of bits used to represent weights and activations, which consequently reduces the size of operands and of the memory. The method is more effective if hybrid quantization is considered in which data in different layers may have different bit widths. This article proposes a new hardware module to calculate dot-products of CNNs with hybrid quantization. The module improves the implementation of CNNs in low density FPGAs, where the same module runs dot-products of different layers with different data quantizations. We show implementation results in ZYNQ7020 and compare with state-of-the-art works. Improvements in area and performance are achieved with the new proposed module.

[1]  Mário P. Véstias,et al.  Lite-CNN: A High-Performance Architecture to Execute CNNs in Low Density FPGAs , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[2]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[3]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[4]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[5]  Yu Cao,et al.  Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[6]  Junzhong Shen,et al.  FPGA‐accelerated deep convolutional neural networks for high throughput and energy efficiency , 2017, Concurr. Comput. Pract. Exp..

[7]  Rajesh Gupta,et al.  Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs , 2017, FPGA.

[8]  Qiuwen Lou,et al.  Design Flow of Accelerating Hybrid Extremely Low Bit-Width Neural Network in Embedded FPGA , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[9]  Christos-Savvas Bouganis,et al.  fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[10]  Shijie Li,et al.  Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks , 2017, ACM Trans. Reconfigurable Technol. Syst..

[11]  Yu Cao,et al.  Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.

[12]  E. George Walters Array Multipliers for High Throughput in Xilinx FPGAs with 6-Input LUTs , 2016, Comput..

[13]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[14]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Christos-Savvas Bouganis,et al.  fpgaConvNet: Mapping Regular and Irregular Convolutional Neural Networks on FPGAs , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Yu Wang,et al.  Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[17]  Soheil Ghiasi,et al.  Hardware-oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.