A Parallel Implementation of the 2D Wavelet Transform Using CUDA

There is a multicore platform that is currently concentrating an enormous attention due to its tremendous potential in terms of sustained performance: the NVIDIA Tesla boards. These cards intended for general-purpose computing on graphic processing units (GPGPUs) are used as data-parallel computing devices. They are based on the Computed Unified Device Architecture (CUDA) which is common to the latest NVIDIA GPUs. The bottom line is a multicore platform which provides an enormous potential performance benefit driven by a non-traditional programming model. In this paper we try to provide some insight into the peculiarities of CUDA in order to target scientific computing by means of a specific example. In particular, we show that the parallelization of the two-dimensional fast wavelet transform for the NVIDIA Tesla C870 achieves a speedup of 20.8 for an image size of 8192x8192, when compared with the fastest host-only version implementation using OpenMP and including the data transfers between main memory and device memory.

[1]  Touradj Ebrahimi,et al.  A study of JPEG 2000 still image coding versus other standards , 2000, 2000 10th European Signal Processing Conference.

[2]  Franco Casalino,et al.  MPEG-4: A Multimedia Standard for the Third Millennium, Part 1 , 1999, IEEE Multim..

[3]  José González,et al.  An efficient implementation of a 3D wavelet transform based encoder on hyper-threading technology , 2007, Parallel Comput..

[4]  Mark J. Harris Fast fluid dynamics simulation on the GPU , 2005, SIGGRAPH Courses.

[5]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[6]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[7]  G. A Theory for Multiresolution Signal Decomposition : The Wavelet Representation , 2004 .

[8]  Thomas Ertl,et al.  GPU‐Based Nonlinear Ray Tracing , 2004, Comput. Graph. Forum.

[9]  Michel Barlaud,et al.  Image coding using wavelet transform , 1992, IEEE Trans. Image Process..

[10]  Franco Casalino,et al.  MPEG-4: A Multimedia Standard for the Third Millennium, Part 2 , 2000, IEEE Multim..

[11]  William A. Pearlman,et al.  Stripe-based SPIHT lossy compression of volumetric medical images for low memory usage and uniform reconstruction quality , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[12]  Kenneth Moreland,et al.  The FFT on a GPU , 2003, HWWS '03.

[13]  Francisco Tirado,et al.  Parallel Implementation of the 2D Discrete Wavelet Transform on Graphics Processing Units: Filter Bank versus Lifting , 2008, IEEE Transactions on Parallel and Distributed Systems.

[14]  José González,et al.  Reducing 3D Fast Wavelet Transform Execution Time Using Blocking and the Streaming SIMD Extensions , 2005, J. VLSI Signal Process..

[15]  J.M. Garcia,et al.  A new lossy 3-D wavelet transform for high-quality compression of medical video , 2000, Proceedings 2000 IEEE EMBS International Conference on Information Technology Applications in Biomedicine. ITAB-ITIS 2000. Joint Meeting Third IEEE EMBS International Conference on Information Technol.

[16]  William A. Pearlman,et al.  Three-dimensional subband coding of video using the zero-tree method , 1996, Other Conferences.

[17]  Han-Wei Shen,et al.  GPU-based 3D wavelet reconstruction with tileboarding , 2005, The Visual Computer.

[18]  Adrian S. Lewis,et al.  Image compression using the 2-D wavelet transform , 1992, IEEE Trans. Image Process..

[19]  Dinesh Manocha,et al.  Fast computation of database operations using graphics processors , 2005, SIGGRAPH Courses.

[20]  William A. Pearlman,et al.  Stripe-based SPHIT lossy compression of volumetric medical images for low memory usage and uniform reconstruction quality , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[21]  Ivan Viola,et al.  Hardware-based nonlinear filtering and segmentation using high-level shading languages , 2003, IEEE Visualization, 2003. VIS 2003..

[22]  Andrew Chi-Sing Leung,et al.  Discrete Wavelet Transform on Consumer-Level Graphics Hardware , 2007, IEEE Transactions on Multimedia.

[23]  Ruigang Yang,et al.  A versatile stereo implementation on commodity graphics hardware , 2005, Real Time Imaging.

[24]  Jerome M. Shapiro,et al.  Embedded image coding using zerotrees of wavelet coefficients , 1993, IEEE Trans. Signal Process..

[25]  Michael W. Marcellin,et al.  An overview of JPEG-2000 , 2000, Proceedings DCC 2000. Data Compression Conference.

[26]  Pedro V. Sander,et al.  Explicit Early-Z Culling for Efficient Fluid Flow Simulation and Rendering , 2004 .