Optimizing Cache Performance of the Discrete Wavelet Transform Using a Visualization Tool

The 2D DWT consists of two 1D DWT in both directions: horizontal filtering processes the rows followed by vertical filtering processes the columns. It is well known that a straightforward implementation of the vertical filtering shows quite different performance with various working set sizes. The only reasonable explanation for this has to be the access behavior of the cache memory. As known, vertical filtering has mapping conflicts in the cache with a working set size that is power of two. However, it is not clear how this conflict forms and whether cache problems exist with other data sizes. Such knowledge is the base for efficient code optimization. In order to acquire this knowledge and to achieve more accurate optimization potentials, we apply a cache visualization tool to examine the runtime cache activities of the vertical implementation. We find that besides mapping conflicts, vertical filtering also shows a large number of capacity misses. More specifically, the visualization tool allows us to detect the parameters related to the strategies. This guarantees the feasibility of the optimization. Our initial experimental results on several different architectures show an up to 215% gain in execution time compared to an already optimized baseline implementation.

[1]  Francisco Tirado Fernández,et al.  2-D wavelet transform enhancement on general-purpose microprocessors: memory hierarchy and SIMD parallelism exploitation , 2002 .

[2]  Antonio Ortega,et al.  Line based reduced memory, wavelet image compression , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[3]  Rabab Kreidieh Ward,et al.  JasPer: a portable flexible open-source software tool kit for image coding/processing , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Wolfgang Karl,et al.  YACO: A User Conducted Visualization Tool for Supporting Cache Optimization , 2005, HPCC.

[5]  Majid Rabbani,et al.  An overview of the JPEG 2000 still image compression standard , 2002, Signal Process. Image Commun..

[6]  Stamatis Vassiliadis,et al.  Improving the memory behavior of vertical filtering in the discrete wavelet transform , 2006, CF '06.

[7]  Christopher Brooks,et al.  Cache-efficient wavelet lifting in JPEG 2000 , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[8]  M.D. Adams Efficient Breadth-First Implementation of the Wavelet Transform , 2006, 2006 IEEE International Symposium on Signal Processing and Information Technology.

[9]  David Salesin,et al.  Wavelets for computer graphics: theory and applications , 1996 .

[10]  Stamatis Vassiliadis,et al.  Performance comparison of SIMD implementations of the discrete wavelet transform , 2005, 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05).

[11]  Andreas Uhl,et al.  Cache issues with JPEG2000 wavelet lifting , 2002, IS&T/SPIE Electronic Imaging.

[12]  David R. O'Hallaron,et al.  Computer Systems: A Programmer's Perspective , 1991 .