On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms

We present a hardware architecture of a heapsort algorithm, the sorting is employed in a subband coding block of a wavelet-based image coder termed Oktem image coder. Although this coder provides good image quality, the sorting is time consuming, and is application specific, as the sorting is repetitively used for different volume of data in the subband coding, thus a simple hardware implementation with fixed sorting capacity will be difficult to scale during runtime. To tackle this problem, the time/power efficiency and the sorting size flexibility have to be taken in to account. We proposed an improved FPGA heapsort architecture based on Zabolotny's work as an IP accelerator of the image coder. We present a configurable architecture by using adaptive layer enable elements so the sorting capacity could be adjusted during runtime to efficiently sort different amount of data. With the adaptive memory shutdown, our improved architecture provides up to 20.9% power reduction on the memories compared to the baseline implementation. Moreover, our architecture provides 13x speedup compared to ARM CortexA 9.

[1]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[2]  Muhsen Owaida,et al.  Synthesis of Platform Architectures from OpenCL Programs , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.

[3]  Wojciech M. Zabołotny Dual port memory based Heapsort implementation for FPGA , 2011, Symposium on Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments (WILGA).

[4]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[5]  Jing Zhang,et al.  OpenCL and the 13 dwarfs: a work in progress , 2012, ICPE '12.