Improved Integral Histogram Algorithm for Big Sized Images in CUDA Environment

Although integral histogram enables histogram computation of a sub-area within constant time, construction of the integral histogram requires O(nm) steps for n × m sized image. Such construction time can be reduced using parallel prefix sum algorithm. Mark Harris proposed an efficient parallel prefix sum and implemented it using CUDA GPGPU. Mark Harris’ algorithm has two problems: leakage of shared memory and inefficiency against big sized images. In this paper, we propose a parallel prefix sum algorithm that prevents the leakage and deals big sized images efficiently. Our proposed algorithm corrects the memory leakage using exact indexing against bank conflicts and eliminates inefficient global memory access by allocating multiple pixels to each thread. As the result, average execution time of the proposed algorithm ranges 95.6% ~ 101.9% compared to that of Harris’ algorithm.

[1]  Naga K. Govindaraju,et al.  Fast scan algorithms on graphics processors , 2008, ICS '08.

[2]  Fatih Murat Porikli,et al.  Integral histogram: a fast way to extract histograms in Cartesian spaces , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Yao Zhang,et al.  Scan primitives for GPU computing , 2007, GH '07.

[4]  W. Daniel Hillis,et al.  Data parallel algorithms , 1986, CACM.

[5]  Anjul Patney,et al.  Efficient computation of sum-products on GPUs through software-managed cache , 2008, ICS '08.

[6]  Mark J. Harris,et al.  Parallel Prefix Sum (Scan) with CUDA , 2011 .

[7]  Ehud Rivlin,et al.  Robust Fragments-based Tracking using the Integral Histogram , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).