Parallel multi‐level 2D‐DWT on CUDA GPUs and its application in ring artifact removal

This paper presented two schemes of parallel 2D discrete wavelet transform (DWT) on Compute Unified Device Architecture graphics processing units. For the first scheme, the image and filter are transformed to spectral domain by using Fast Fourier Transformation (FFT), multiplied and then transformed back to space domain by using inverse FFT. For the second scheme, the image pixels are convolved directly with filters. Because there is no data relevance, the convolution for data points on different positions could be executed concurrently. To reduce data transfer, the boundary extension and down‐sampling are processed during data loading stage, and transposing is completed implicitly during data storage. A similar skill is adopted when parallelizing inverse 2D DWT. To further speed up the data access, the filter coefficients are stored in the constant memory. We have parallelized the 2D DWT for dozens of wavelet types and achieved a speedup factor of over 380 times compared with that of its CPU version. We applied the parallel 2D DWT in a ring artifact removal procedure; the executing speed was accelerated near 200 times compared with its CPU version. The experimental results showed that the proposed parallel 2D DWT on graphics processing units can significantly improve the performance for a wide variety of wavelet types and is promising for various applications. Copyright © 2015 John Wiley & Sons, Ltd.

[1]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[2]  Dadong Wang,et al.  An improved method for the removal of ring artifacts in synchrotron radiation images by using GPGPU computing with compute unified device architecture , 2014, Concurr. Comput. Pract. Exp..

[3]  Jos B. T. M. Roerdink,et al.  Accelerating Wavelet Lifting on Graphics Hardware Using CUDA , 2011, IEEE Transactions on Parallel and Distributed Systems.

[4]  Manuel Ujaldon,et al.  The 2D wavelet transform on emerging architectures: GPUs and multicores , 2011, Journal of Real-Time Image Processing.

[5]  Pavel Zemcík,et al.  2-D Discrete Wavelet Transform Using GPU , 2014, 2014 International Symposium on Computer Architecture and High Performance Computing Workshop.

[6]  Han-Wei Shen,et al.  GPU-based 3D wavelet reconstruction with tileboarding , 2005, The Visual Computer.

[7]  J.B.T.M. Roerdink,et al.  Accelerating wavelet-based video coding on graphics hardware using CUDA , 2009, 2009 Proceedings of 6th International Symposium on Image and Signal Processing and Analysis.

[8]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[9]  Kenneth Moreland,et al.  The FFT on a GPU , 2003, HWWS '03.

[10]  Thomas Ertl,et al.  Hardware Accelerated Wavelet Transformations , 2000, VisSym.

[11]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Francisco Tirado,et al.  Parallel Implementation of the 2D Discrete Wavelet Transform on Graphics Processing Units: Filter Bank versus Lifting , 2008, IEEE Transactions on Parallel and Distributed Systems.

[13]  Manuel E. Acacio,et al.  A Parallel Implementation of the 2D Wavelet Transform Using CUDA , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[14]  Kenli Li,et al.  Performance Analysis and Optimization for SpMV on GPU Using Probabilistic Modeling , 2015, IEEE Transactions on Parallel and Distributed Systems.

[15]  Andrew Chi-Sing Leung,et al.  Discrete Wavelet Transform on Consumer-Level Graphics Hardware , 2007, IEEE Transactions on Multimedia.

[16]  Stéphane Mallat,et al.  Multifrequency channel decompositions of images and wavelet models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[17]  B. Münch,et al.  Stripe and ring artifact removal with combined wavelet--Fourier filtering. , 2009, Optics express.

[18]  S. Mallat A wavelet tour of signal processing , 1998 .