Pyramidal Neuron Architectures for AcceleratingDeep Neural Networks on FPGA

This paper presents three Pyramidal Neuron Accelerator Architectures (PNAA) units that can be used to accelerate the different deep neural network algorithms. The main concept that the proposed three PNAA relied on for accelerating the computational speedis the parallelism capability provided by using the Field Programmable Gate Array (FPGA) as the targeted hardware platform. Each of the three-PNAA has different spatial dimensions that depend on the standard weight filter sizes in the Convolution Layer of the convolutional neural networks, which are 3X3, 5X5, 7X7. The proposed PNAA units are designed based on an 8-bits and 9-bits fixed point numerical format using VHDL language and are consist of three hierarchical layers. The computational throughputs of the proposed PNAAunits can achieve up to 19.98 Giga Operation per Second (GOPS) for the 7X7 PNAA using the high-density Stratix V FPGAs. The primary factor for the obtained highspeed computational performance of the proposed systems is directly related to the replacement of the conventional MultiplyAccumulate (MAC) unit, by the proposed Multiply Array Grid (MAG) units and the Multiply Parallel Addition (MPA) units.

[1]  Yu Wang,et al.  Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2]  HAI WANG,et al.  Enhanced Efficiency 3D Convolution Based on Optimal FPGA Accelerator , 2017, IEEE Access.

[3]  Reza Ebrahimpour,et al.  A Resource-Limited Hardware Accelerator for Convolutional Neural Networks in Embedded Vision Applications , 2017, IEEE Transactions on Circuits and Systems II: Express Briefs.

[4]  Shawki Areibi,et al.  Deep Learning on FPGAs: Past, Present, and Future , 2016, ArXiv.

[5]  Amir Mosavi,et al.  Deep Learning: A Review , 2018 .

[6]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[7]  Hoi-Jun Yoo,et al.  14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[8]  Mohamed Dessouky,et al.  Concurrent MAC unit design using VHDL for deep learning networks on FPGA , 2018, 2018 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE).

[9]  Yingxue Wang,et al.  A Two-Dimensional Configurable Active Silicon Dendritic Neuron Array , 2011, IEEE Transactions on Circuits and Systems I: Regular Papers.

[10]  Bernard Brezzo,et al.  TrueNorth: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11]  Mário P. Véstias,et al.  Parallel dot-products for deep learning on FPGA , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[12]  Tim Dettmers,et al.  8-Bit Approximations for Parallelism in Deep Learning , 2015, ICLR.

[13]  Kiyoung Choi,et al.  Efficient FPGA acceleration of Convolutional Neural Networks using logical-3D compute array , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[14]  Soheil Ghiasi,et al.  Hardware-oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.

[15]  Mohamed Dessouky,et al.  High-Speed 2D Parallel MAC Unit Hardware Accelerator for Convolutional Neural Network , 2018, IntelliSys.