Block-based two-dimensional wavelet transform running on graphics processing unit

This study explores the use of the graphics processing units (GPUs) for performing the two-dimensional discrete wavelet transform (DWT) of images. The study of fast wavelet transforms has been driven both by the enormous volumes of data produced by modern cameras and by the need for real-time processing of these data. With the emergence of general computing on GPUs, many time-consuming applications have started to reap the associated benefits. In the implementation of a GPU-based DWT, two approaches are used according to the published works, which are the row-column (RC) approach and the block-based (BB) approach. Most state-of-the-art techniques are based on the RC approach, which utilises the parallelism between different rows and columns; few works are based on the BB approach, which explores the parallelism between different blocks of the image. Although easy to implement, resource usage of the RC approach is usually related to the image size. Another shortcoming of the RC approach lies in the fact, according to the author's analysis, that more global memory access is required. The authors thus select the BB approach in this study. Experiment results show that the proposed BB approach outperforms the RC approach, being 99× faster than a native CPU implementation for 4096 × 4096 images.

[1]  Wim Sweldens,et al.  The lifting scheme: a construction of second generation wavelets , 1998 .

[2]  Bormin Huang,et al.  GPU Acceleration of the Updated Goddard Shortwave Radiation Scheme in the Weather Research and Forecasting (WRF) Model , 2012, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[3]  Jos B. T. M. Roerdink,et al.  Accelerating Wavelet Lifting on Graphics Hardware Using CUDA , 2011, IEEE Transactions on Parallel and Distributed Systems.

[4]  Francisco Tirado,et al.  Parallel Implementation of the 2D Discrete Wavelet Transform on Graphics Processing Units: Filter Bank versus Lifting , 2008, IEEE Transactions on Parallel and Distributed Systems.

[5]  Andrew Chi-Sing Leung,et al.  Discrete Wavelet Transform on Consumer-Level Graphics Hardware , 2007, IEEE Transactions on Multimedia.

[6]  Manuel Ujaldon,et al.  Parallel 3D fast wavelet transform on manycore GPUs and multicore CPUs , 2010, ICCS.

[7]  Sungdae Cho,et al.  Design and Performance Evaluation of Image Processing Algorithms on GPUs , 2011, IEEE Transactions on Parallel and Distributed Systems.

[8]  Thomas Ertl,et al.  Hardware Accelerated Wavelet Transformations , 2000, VisSym.

[9]  R. Haddad,et al.  Multiresolution Signal Decomposition: Transforms, Subbands, and Wavelets , 1992 .

[10]  Bormin Huang,et al.  GPU Implementation of Stony Brook University 5-Class Cloud Microphysics Scheme in the WRF , 2012, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[11]  Yunsong Li,et al.  A GPU-Accelerated Wavelet Decompression System With SPIHT and Reed-Solomon Decoding for Satellite Images , 2011, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[12]  Manuel Ujaldon,et al.  The 2D wavelet transform on emerging architectures: GPUs and multicores , 2011, Journal of Real-Time Image Processing.

[13]  Antonio J. Plaza,et al.  GPU Implementation of an Automatic Target Detection and Classification Algorithm for Hyperspectral Image Analysis , 2013, IEEE Geoscience and Remote Sensing Letters.

[14]  A. Alfalou,et al.  Optimized pre-processing input plane GPU implementation of an optical face recognition technique using a segmented phase only composite filter , 2013 .

[15]  Laurent Perroton,et al.  Special issue (part II) on parallel computing for real-time image processing , 2011, J. Real Time Image Process..

[16]  Michael W. Marcellin,et al.  JPEG2000 - image compression fundamentals, standards and practice , 2002, The Kluwer International Series in Engineering and Computer Science.

[17]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[18]  Manuel E. Acacio,et al.  A Parallel Implementation of the 2D Wavelet Transform Using CUDA , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[19]  Bormin Huang,et al.  GPU-Accelerated Multi-Profile Radiative Transfer Model for the Infrared Atmospheric Sounding Interferometer , 2011, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[20]  Jiří Matela GPU-Based DWT Acceleration for JPEG2000 , 2009 .

[21]  Bormin Huang,et al.  Accelerating Regular LDPC Code Decoders on GPUs , 2011, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[22]  Seah Hock Soon,et al.  GPU-Accelerated Real-Time Tracking of Full-Body Motion With Multi-Layer Search , 2013, IEEE Transactions on Multimedia.

[23]  Bormin Huang,et al.  GPU Acceleration of Predictive Partitioned Vector Quantization for Ultraspectral Sounder Data Compression , 2011, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[24]  Chein-I Chang,et al.  High Performance Computing in Remote Sensing , 2007, HiPC 2007.

[25]  Mohamed Akil,et al.  Special issue (part III) on parallel computing for real-time image processing , 2012 .

[26]  Ümit V. Çatalyürek,et al.  High-performance signal processing on emerging many-core architectures using cuda , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[27]  Roberto Lario,et al.  The 2D Discrete Wavelet Transform on Programmable Graphics Hardware , 2004 .