Improving the performance of 2D Discrete Wavelet Transform using data-level parallelism

The JPEG2000 standard uses the 2D Discrete Wavelet Transform (2D DWT), while the JPEG standard uses the 2D Discrete Cosine Transform (DCT). However, the 2D DWT has higher computational requirements than the 2D DCT and consumes a significant part of the total JPEG2000 encoding time. One way to improve the performance of the 2D DWT is using parallel techniques on an SIMD-enhanced architecture. In this paper, we apply data-level parallelism technique to exploit available parallelism of the 2D DWT. We focus on the two algorithms to traverse an image to implement the 2D Discrete Wavelet Transform (DWT), namely Row-Column Wavelet Transform (RCWT) and Line-Based Wavelet Transform (LBWT). Our experimental results show that the SIMD implementation of the LBWT algorithm is more complicated than the SIMD implementation of the RCWT algorithm, while the former algorithm is 1.60 times faster than the latter algorithm for an image of size 4096 × 4096.

[1]  Uri C. Weiser,et al.  MMX technology extension to the Intel architecture , 1996, IEEE Micro.

[2]  David B. Stewart Measuring Execution Time and Real-Time Performance , 2001 .

[3]  Gauthier Lafruit,et al.  Cache misses and energy-dissipation results for JPEG-2000 filtering , 2002, 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).

[4]  Ayal Zaks,et al.  Vectorizing for a SIMdD DSP architecture , 2003, CASES '03.

[5]  Francisco Argüello,et al.  A memory system supporting the efficient SIMD computation of the two dimensional DWT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6]  Touradj Ebrahimi,et al.  The JPEG2000 still image coding system: an overview , 2000, IEEE Trans. Consumer Electron..

[7]  Faouzi Kossentini,et al.  Reversible integer-to-integer wavelet transforms for image compression: performance evaluation and analysis , 2000, IEEE Trans. Image Process..

[8]  Thanos Stouraitis,et al.  A local wavelet transform implementation versus an optimal row-column algorithm for the 2D multilevel decomposition , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[9]  David Salesin,et al.  Wavelets for computer graphics: theory and applications , 1996 .

[10]  Antonio Ortega,et al.  Line based reduced memory, wavelet image compression , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[11]  Francisco Tirado Fernández,et al.  2-D wavelet transform enhancement on general-purpose microprocessors: memory hierarchy and SIMD parallelism exploitation , 2002 .

[12]  Gauthier Lafruit,et al.  High-Level Cache Modeling for 2-D Discrete Wavelet Transform Implementations , 2003, J. VLSI Signal Process..

[13]  David R. O'Hallaron,et al.  Computer systems - a programmers perspective , 2003 .

[14]  Francisco Tirado,et al.  Vectorization of the 2D wavelet lifting transform using SIMD extensions , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[15]  Peter Schelkens,et al.  Analysis of wavelet transform implementations for image and texture coding applications in programmable platforms , 2001, 2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578).

[16]  Rade Kutil A single-loop approach to SIMD parallelization of 2D wavelet lifting , 2006, 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP'06).

[17]  David R. O'Hallaron,et al.  Computer systems : a programmer's perspective beta version , 2003 .

[18]  Ben Juurlink,et al.  Efficient Vectorization of the FIR Filter Asadollah , 2005 .

[19]  Rabab Kreidieh Ward,et al.  JasPer: a portable flexible open-source software tool kit for image coding/processing , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Vladimir M. Pentkovski,et al.  Implementing Streaming SIMD Extensions on the Pentium III Processor , 2000, IEEE Micro.

[21]  I. Daubechies,et al.  Biorthogonal bases of compactly supported wavelets , 1992 .