ASIC-based architecture for the real-time computation of 2D convolution with large kernel size

Bidimensional convolution is a low-level processing algorithm of interest in many areas, but its high computational cost constrains the size of the kernels, especially in real-time embedded systems. This paper presents a hardware architecture for the ASIC-based implementation of 2-D convolution with medium–large kernels. Aiming to improve the efficiency of storage resources on-chip, reducing off-chip bandwidth of these two issues, proposed construction of a data cache reuse. Multi-block SPRAM to cross cached images and the on-chip ping-pong operation takes full advantage of the data convolution calculation reuse, design a new ASIC data scheduling scheme and overall architecture. Experimental results show that the structure can achieve 40× 32 size of template real-time convolution operations, and improve the utilization of on-chip memory bandwidth and on-chip memory resources, the experimental results show that the structure satisfies the conditions to maximize data throughput output , reducing the need for off-chip memory bandwidth.

[1]  Marco Lanuzza,et al.  A high-performance fully reconfigurable FPGA-based 2D convolution processor , 2005, Microprocess. Microsystems.

[2]  Srihari Cadambi,et al.  A Massively Parallel Coprocessor for Convolutional Neural Networks , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[3]  José Manuel Ferrández,et al.  FPGA-based architecture for the real-time computation of 2-D convolution with large kernel size , 2012, J. Syst. Archit..

[4]  Wayne Luk,et al.  Have GPUs made FPGAs redundant in the field of video processing? , 2005, Proceedings. 2005 IEEE International Conference on Field-Programmable Technology, 2005..

[5]  Mariano Fons,et al.  Run-time self-reconfigurable 2D convolver for adaptive image processing , 2011, Microelectron. J..

[6]  Eduardo Ros,et al.  A Comparison of FPGA and GPU for Real-Time Phase-Based Optical Flow, Stereo, and Local Image Features , 2012, IEEE Transactions on Computers.

[7]  Dingbin Liao,et al.  VLSI implementation of multiple large template-based image matching for automatic target recognition , 2011, International Symposium on Multispectral Image Processing and Pattern Recognition.

[8]  Vijayan K. Asari,et al.  An efficient multiplier-less architecture for 2-D convolution with quadrant symmetric kernels , 2007, Integr..