Data Locality Optimization Based on Comprehensive Knowledge of the Cache Miss Reason: A Case Study with DWT

The overall performance of a computing system increasingly depends on the efficient use of the cache memories. Traditional approaches for cache tuning deploy performance tools to help the user optimize the source program towards a better runtime data locality. Following this conventional way, we developed a set of such toolkits including data profiling, pattern analysis, and performance visualization tools. This paper demonstrates how the toolset can be used step-by-step to understand the cache access behavior of the applications and then achieve optimized program code. The Discrete Wavelet Transform, a common used algorithm for image and video compression, is applied as an example. Our initial experimental results with this sample application show an up to 3.19 speedup in execution time compared to the original implementation.

[1]  Peter Schelkens,et al.  Analysis of wavelet transform implementations for image and texture coding applications in programmable platforms , 2001, 2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578).

[2]  Josef Weidendorfer,et al.  A Tool Suite for Simulation Based Analysis of Memory Access Behavior , 2004, International Conference on Computational Science.

[3]  Wolfgang Karl,et al.  YACO: A User Conducted Visualization Tool for Supporting Cache Optimization , 2005, HPCC.

[4]  Wolfgang E. Nagel,et al.  Performance Optimization for Large Scale Computing: The Scalable VAMPIR Approach , 2001, International Conference on Computational Science.

[5]  Toshio Nakatani,et al.  Stride prefetching by dynamically inspecting objects , 2003, PLDI '03.

[6]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[7]  Wei-Fen Lin,et al.  Reducing DRAM latencies with an integrated memory hierarchy design , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[8]  Jack Dongarra,et al.  Using PAPI for Hardware Performance Monitoring on Linux Systems , 2001 .

[9]  Wolfgang Karl,et al.  Analysis of the Spatial and Temporal Locality in Data Accesses , 2006, International Conference on Computational Science.

[10]  Wolfgang Karl,et al.  A Profiling Tool for Detecting Cache-Critical Data Structures , 2007, Euro-Par.

[11]  Jean-Loup Baer,et al.  Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.