An Image Processing Architecture to Exploit I/O Bandwidth on Reconfigurable Computers

FPGA devices in reconfigurable computers (RCs) allow datapath, memory, and processing elements (PEs) to be customized in order to achieve very efficient algorithm implementations. However, the maximum speedup on RCs is bounded by the bandwidth available between muPs and FPGA hardware accelerators. In this paper, an image processing architecture is presented to fully exploit this bandwidth for achieving the maximum possible speedup. This architecture can be used to implement any convolution operation between an image and a kernel, and comprises four fully pipelined components: a line buffer, a data window, an array of PEs and a data concatenating block. Multiple image processing algorithms have been successfully implemented using this architecture, such as digital filters, edge detectors, and image transforms. In all cases, the maximum throughput is upper-bounded by the muP-FPGA I/O bandwidth, regardless of the complexity of the algorithm. This end-to-end throughput has been measured to be 1.2 GB/s on Cray XD1 and 2.1 GB/s on SGI RC100.

[1]  Proshanta Saha,et al.  Portable Library Development for Reconfigurable Computing Systems , 2007 .

[2]  Rafael C. González,et al.  Digital image processing, 3rd Edition , 2008 .

[3]  Mohamed Akil,et al.  Low level image processing operators on FPGA: implementation examples and performance evaluation , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 2 - Conference B: Computer Vision & Image Processing. (Cat. No.94CH3440-5).

[4]  Venkatesan Muthukumar,et al.  Image processing algorithms on reconfigurable architecture using HandelC , 2004 .

[5]  Donald G. Bailey,et al.  Using design patterns to overcome image processing constraints on FPGAs , 2006, Third IEEE International Workshop on Electronic Design, Test and Applications (DELTA'06).