Performance comparison of SIMD implementations of the discrete wavelet transform

This paper focuses on SIMD implementations of the 2D discrete wavelet transform (DWT). The transforms considered are Daubechies' real-to-real method of four coefficients (Daub-4) and the integer-to-integer (5, 3) lifting scheme. Daub-4 is implemented using SSE and the lifting scheme using MMX, and their performance is compared to C implementations on a Pentium 4 processor. The MMX implementation of the lifting scheme is up to 4.0/spl times/ faster than the corresponding C program for a 1-level 2D DWT, while the SSE implementation of Daub-4 is up to 2.6/spl times/ faster than the C version. It is shown that for some image sizes, the performance is significantly hampered by the so called 64K aliasing problem, which occurs in the Pentium 4 when two data blocks are accessed that are a multiple of 64K apart. It is also shown that for the (5, 3) lifting scheme, a 12-bit word size is sufficient for a 5-level decomposition of the 2D DWT for images of up to 10 bits per pixel.

[1]  Majid Rabbani,et al.  An overview of the JPEG 2000 still image compression standard , 2002, Signal Process. Image Commun..

[2]  Stamatis Vassiliadis,et al.  Matrix register file and extended subwords: two techniques for embedded media processors , 2005, CF '05.

[3]  Andreas Uhl,et al.  Wavelet image and video coding on parallel architectures , 2001, ISPA 2001. Proceedings of the 2nd International Symposium on Image and Signal Processing and Analysis. In conjunction with 23rd International Conference on Information Technology Interfaces (IEEE Cat..

[4]  Francisco Tirado,et al.  Parallel wavelet transform for large scale image processing , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[5]  Wenfang Zhang,et al.  The parallel algorithm of 2-D discrete wavelet transform , 2003, Proceedings of the Fourth International Conference on Parallel and Distributed Computing, Applications and Technologies.

[6]  Faouzi Kossentini,et al.  Reversible integer-to-integer wavelet transforms for image compression: performance evaluation and analysis , 2000, IEEE Trans. Image Process..

[7]  Francisco Tirado Fernández,et al.  2-D wavelet transform enhancement on general-purpose microprocessors: memory hierarchy and SIMD parallelism exploitation , 2002 .

[8]  David R. O'Hallaron,et al.  Computer Systems: A Programmer's Perspective , 1991 .

[9]  José González,et al.  Reducing 3D wavelet transform execution time through the Streaming SIMD Extensions , 2003, Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2003. Proceedings..

[10]  Vladimir M. Pentkovski,et al.  Implementing Streaming SIMD Extensions on the Pentium III Processor , 2000, IEEE Micro.

[11]  Fionn Murtagh,et al.  Adaptive wavelet eye-gaze-based video compression , 2003, SPIE OPTO-Ireland.

[12]  Uri C. Weiser,et al.  Intel MMX for multimedia PCs , 1997, Commun. ACM.

[13]  Francisco Tirado,et al.  Vectorization of the 2D wavelet lifting transform using SIMD extensions , 2003, Proceedings International Parallel and Distributed Processing Symposium.