A single-loop approach to SIMD parallelization of 2D wavelet lifting

Widespread use of wavelet transforms as in JPEG2000 demands efficient implementations on general purpose computers as well as dedicated hardware. The increasing availability of SIMD technologies is a great challenge since efficient SIMD parallelizations are not trivial. This work presents a parallelized 2D wavelet transform following a single-loop approach, i.e. a loop fusion of the lifting steps of horizontal filtering, and interleaving horizontal and vertical filtering for optimal temporal locality. In this way, each input value is read only once and each output value is written once without subsequent updates. Such an approach turns out to be a necessary basis for an efficient SIMD parallelization. Results are obtained on a general purpose processor with a 4-fold single-precision SIMD extension. Speedups of about 3.7 due to the use of SIMD, 2.55 due to the single-loop approach and up to 6 due to cache effects for pathologic data sizes are obtained, giving total speedups of up to 56.

[1]  Andreas Uhl,et al.  Optimization of 3-D Wavelet Decomposition on Multiprocessors , 2000 .

[2]  Francisco Tirado Fernández,et al.  2-D wavelet transform enhancement on general-purpose microprocessors: memory hierarchy and SIMD parallelism exploitation , 2002 .

[3]  Wim Sweldens,et al.  The lifting scheme: a construction of second generation wavelets , 1998 .

[4]  Chaitali Chakrabarti,et al.  Efficient realizations of the discrete and continuous wavelet transforms: from single chip implementations to mappings on SIMD array computers , 1995, IEEE Trans. Signal Process..

[5]  Gauthier Lafruit,et al.  The Local Wavelet Transform: a memory-efficient, high-speed architecture optimized to a Region-Oriented Zero-Tree coder , 2000, Integr. Comput. Aided Eng..

[6]  Andreas Uhl,et al.  Cache issues with JPEG2000 wavelet lifting , 2002, IS&T/SPIE Electronic Imaging.

[7]  Antonio Ortega,et al.  Line based reduced memory, wavelet image compression , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[8]  K. Zeger,et al.  Memory constrained wavelet based image coding , 1998, IEEE Signal Processing Letters.

[9]  I. Daubechies,et al.  Factoring wavelet transforms into lifting steps , 1998 .

[10]  Andreas Uhl,et al.  Wavelet Packet Decomposition And Best Basis Selection On Massively Parallel SIMD Arrays , 1998 .

[11]  Hassane Essafi,et al.  Wavelet transform on parallel SIMD architectures , 1993, Defense, Security, and Sensing.

[12]  I. Daubechies,et al.  Factoring wavelet transforms into lifting steps , 1998 .

[13]  A. Uhl,et al.  SIMD Parallelization of Common Wavelet Filters , 2005 .

[14]  William A. Pearlman,et al.  A new, fast, and efficient image codec based on set partitioning in hierarchical trees , 1996, IEEE Trans. Circuits Syst. Video Technol..

[15]  Elias S. Manolakos,et al.  On the Scalability of 2-D Discrete Wavelet Transform Algorithms , 1997, Multidimens. Syst. Signal Process..

[16]  W. Sweldens The Lifting Scheme: A Custom - Design Construction of Biorthogonal Wavelets "Industrial Mathematics , 1996 .

[17]  Antonio Ortega,et al.  Minimum memory implementations of the lifting scheme , 2000, SPIE Optics + Photonics.

[18]  Mahn-ling Woo Parallel Discrete Wavelet Transform on the Paragon MIMD Machine , 1995, PPSC.

[19]  Andreas Uhl,et al.  Hardware and Software Aspects for 3-D Wavelet Decomposition on Shared Memory MIMD Computers , 1999, ACPC.

[20]  Francisco Tirado,et al.  Vectorization of the 2D wavelet lifting transform using SIMD extensions , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[21]  CremonesiPaolo Parallel, distributed and network-based processing , 2006 .