Low-Power Small-Area $3\times 3$ Convolution Hardware Design

The two-dimensional (2-D) convolution has a large influence on the overall performance of a neural network. Typically, a $3\times 3$ convolution (i.e., the kernel size is $3\times 3$) requires nine multipliers and one adder tree. In this paper, we propose a low-power high-speed $3\times 3$ convolution hardware design. Different from previous works, we merge 9 multipliers' final additions with an adder tree. Compared with a conventional 3x3 convolution design, experimental results show that the proposed $3\times 3$ convolution design can reduce 14.9% power consumption and save 10.6% circuit area.

[1]  Yvon Savaria,et al.  Reconfigurable pipelined 2-D convolvers for fast digital signal processing , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[2]  Hui Zhang,et al.  A Multiwindow Partial Buffering Scheme for FPGA-Based 2-D Convolvers , 2007, IEEE Transactions on Circuits and Systems II: Express Briefs.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Earl E. Swartzlander,et al.  A comparison of Dadda and Wallace multiplier delays , 2003, SPIE Optics + Photonics.