-D Wavelet Transform Enhancement on General-Purpose Microprocessors: Memory Hierarchy and SIMD Parallelism Exploitation

This paper addresses the implementation of a 2-D Discrete Wavelet Transform on general-purpose microprocessors, focusing on both memory hierarchy and SIMD parallelization issues. Both topics are somewhat related, since SIMD extensions are only useful if the memory hierarchy is efficiently exploited. In this work, locality has been significantly improved by means of a novel approach called pipelined computation, which complements previous techniques based on loop tiling and non-linear layouts. As experimental platforms we have employed a Pentium-III (P-III) and a Pentium-4 (P-4) microprocessor. However, our SIMD-oriented tuning has been exclusively performed at source code level. Basically, we have reordered some loops and introduced some modifications that allow automatic vectorization. Taking into account the abstraction level at which the optimizations are carried out, the speedups obtained on the investigated platforms are quite satisfactory, even though further improvement can be obtained by dropping the level of abstraction (compiler intrinsics or assembly code).

[1]  C. Chakrabarti,et al.  Efficient realizations of encoders and decoders based on the 2-D discrete wavelet transform , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[2]  Jack J. Dongarra,et al.  End-user Tools for Application Performance Analysis Using Hardware Counters , 2001, ISCA PDCS.

[3]  Rick S. Blum,et al.  A categorization of multiscale-decomposition-based image fusion schemes with a performance study for a digital camera application , 1999, Proc. IEEE.

[4]  Mithuna Thottethodi,et al.  Nonlinear array layouts for hierarchical memory systems , 1999, ICS '99.

[5]  Francisco Tirado,et al.  Wavelet Transform for Large Scale Image Processing on Modern Microprocessors , 2002, VECPAR.

[6]  David Salesin,et al.  Wavelets for computer graphics: theory and applications , 1996 .

[7]  Francisco Tirado,et al.  Parallel wavelet transform for large scale image processing , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[8]  Markus Hegland,et al.  Parallel Performance of Fast Wavelet Transforms , 2000, Int. J. High Speed Comput..

[9]  Mats Holmström Parallelizing the Fast Wavelet Transform , 1995, Parallel Comput..

[10]  Mohan Vishwanath The recursive pyramid algorithm for the discrete wavelet transform , 1994, IEEE Trans. Signal Process..

[11]  Antonio Ortega,et al.  Line-based, reduced memory, wavelet image compression , 2000, IEEE Trans. Image Process..

[12]  T. C. Denk,et al.  VLSI architectures for lattice structure based orthonormal discrete wavelet transforms , 1997 .

[13]  Linda Yang,et al.  Coarse-Grained Parallel Algorithms for Multi-Dimensional Wavelet Transforms , 2004, The Journal of Supercomputing.

[14]  Andreas Uhl,et al.  Multicomputer algorithms for wavelet packet image decomposition , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[15]  Andreas Uhl,et al.  Cache issues with JPEG2000 wavelet lifting , 2002, IS&T/SPIE Electronic Imaging.