Enhanced Efficiency 3D Convolution Based on Optimal FPGA Accelerator

This paper presents an enhanced efficiency 3-D convolution operator based on optimal field programmable gate array (FPGA) accelerator platform. The proposed system takes advantages of the intermediate data delay lines, implemented in an FPGA, to avoid loading repetition of the input feature maps. This 3-D convolution accelerator performs 268.07 giga operations per second at 100-MHz operation frequency, with 330-mW power consumption. We experimentally demonstrate the enhanced efficiency of the proposed convolution accelerator, in comparison with the conventional technologies. The proposed 3-D convolution accelerator may find interesting applications in neural networks and video processing.

[1]  Andrzej Sluzek,et al.  Design of an Area-Efficient Multiplierless Processing Element For Fast Two Dimensional Image Convolution , 2006, 2006 13th IEEE International Conference on Electronics, Circuits and Systems.

[2]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[3]  Tianshi Chen,et al.  ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[4]  ZhangBin,et al.  Hardware Implementation of Reconfigurable 1D Convolution , 2016 .

[5]  Karin Strauss,et al.  Accelerating Deep Convolutional Neural Networks Using Specialized Hardware , 2015 .

[6]  Hai Jin,et al.  A Novel GPU-Based Efficient Approach for Convolutional Neural Networks with Small Filters , 2017, J. Signal Process. Syst..

[7]  Bin Zhang,et al.  Hardware Implementation of Reconfigurable 1D Convolution , 2016, J. Signal Process. Syst..

[8]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[9]  Hui Zhang,et al.  A Multiwindow Partial Buffering Scheme for FPGA-Based 2-D Convolvers , 2007, IEEE Transactions on Circuits and Systems II: Express Briefs.

[10]  Yang Yang,et al.  Recurrently Decomposable 2-D Convolvers for FPGA-Based Digital Image Processing , 2016, IEEE Transactions on Circuits and Systems II: Express Briefs.

[11]  Paolo Prinetto,et al.  An area-efficient 2-D convolution implementation on FPGA for space applications , 2011, 2011 IEEE 6th International Design and Test Workshop (IDT).

[12]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[14]  Henk Corporaal,et al.  Memory-centric accelerator design for Convolutional Neural Networks , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[17]  Soheil Ghiasi,et al.  Design space exploration of FPGA-based Deep Convolutional Neural Networks , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[18]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[19]  Jason Cong,et al.  Minimizing Computation in Convolutional Neural Networks , 2014, ICANN.

[20]  Kyandoghere Kyamakya,et al.  CNN based high performance computing for real time image processing on GPU , 2011 .