Efficient Mapping of Multiresolution Image Filtering Algorithms on Graphics Processors

In the last decade, there has been a dramatic growth in research and development of massively parallel commodity graphics hardware both in academia and industry. Graphics card architectures provide an optimal platform for parallel execution of many number crunching loop programs from fields like image processing, linear algebra, etc. However, it is hard to efficiently map such algorithms to the graphics hardware even with detailed insight into the architecture. This paper presents a multiresolution image processing algorithm and shows the efficient mapping of this type of algorithms to the graphics hardware. Furthermore, the impact of execution configuration is illustrated and a method is proposed to determine the best configuration offline in order to use it at run-time. Using CUDA as programming model, it is demonstrated that the image processing algorithm is significantly accelerated and that a speedup of up to 33x can be achieved on NVIDIA's Tesla C870 compared to a parallelized implementation on a Xeon Quad Core.

[1]  Jürgen Teich,et al.  A Design Methodology for Hardware Acceleration of Adaptive Filter Algorithms in Image Processing , 2006, IEEE 17th International Conference on Application-specific Systems, Architectures and Processors (ASAP'06).

[2]  Touradj Ebrahimi,et al.  The JPEG2000 still image coding system: an overview , 2000, IEEE Trans. Consumer Electron..

[3]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[4]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[5]  Rolf Ernst,et al.  An image processor for digital film , 2005, 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05).

[6]  Uday Bondhugula,et al.  Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories , 2008, PPoPP.

[7]  Til Aach,et al.  Nonlinear multiresolution gradient adaptive filter for medical images , 2003, SPIE Medical Imaging.

[8]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[9]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[10]  Justin P. Haldar,et al.  Accelerating advanced mri reconstructions on gpus , 2008, CF '08.

[11]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[12]  Sam S. Stone,et al.  Program Optimization Study on a 128-Core GPU , 2011 .