Improved Parallel Image Processing Algorithms by CUDA in GPU Environment
暂无分享,去创建一个
Integral histogram enables constant time histogram computation of an area. Mark Harris proposed a parallel prefix sum algorithm in CUDA GPGPU for integral histogram initialization. Because of the restricted number of threads in a block in CUDA, Harris' algorithm divides a large image into multiple blocks. Such division increases the number of global memory access and becomes a major reason of performance degradation. In this paper we propose an allocation scheme that maps multiple pixels into a thread when the integral histogram is initialized for a large image. The proposed allocation scheme fully utilizes shared memory and reduces the number of accesses to global memory. Experimental results shows that the execution time of the proposed algorithm is 94.7% ~ 99.8% compared to that of Harris' algorithm.
[1] Naga K. Govindaraju,et al. Fast scan algorithms on graphics processors , 2008, ICS '08.
[2] Anjul Patney,et al. Efficient computation of sum-products on GPUs through software-managed cache , 2008, ICS '08.
[3] Mark J. Harris,et al. Parallel Prefix Sum (Scan) with CUDA , 2011 .
[4] Yao Zhang,et al. Scan primitives for GPU computing , 2007, GH '07.
[5] W. Daniel Hillis,et al. Data parallel algorithms , 1986, CACM.