Enabling partial cache line prefetching through data compression

Hardware prefetching is a simple and effective technique for hiding cache miss latency and thus improving the overall performance. However, it comes with addition of prefetch buffers and causes significant memory traffic increase. We propose a new prefetching scheme which improves performance without increasing memory traffic or requiring prefetch buffers. We observe that a significant percentage of dynamically appearing values exhibit characteristics that enable their compression using a very simple compression scheme. The bandwidth freed by transferring values from lower levels in memory hierarchy to upper levels in compressed form is used to prefetch additional compressible values. These prefetched values are held in vacant space created in the data cache by storing values in compressed form. Thus, in comparison to other prefetching schemes, our scheme does not introduce prefetch buffers or increase the memory traffic. In comparison to a baseline cache that does not support prefetching, on average, our cache design reduces the memory traffic by 10%, reduces the data cache miss rate by 14%, and speeds up program execution by 7%

[1]  Mark D. Hill,et al.  Multiprocessors Should Support Simple Memory-Consistency Models , 1998, Computer.

[2]  Martin C. Carlisle,et al.  Olden: parallelizing programs with dynamic data structures on distributed-memory machines , 1996 .

[3]  Jun Yang,et al.  Energy efficient Frequent Value data Cache design , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[4]  Youtao Zhang The design and implementation of compression techniques for profile guided compilation , 2002 .

[5]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[6]  Rajiv Gupta,et al.  Data Compression Transformations for Dynamically Allocated Data Structures , 2002, CC.

[7]  Jun Yang,et al.  Frequent value locality and value-centric data cache design , 2000, SIGP.

[8]  James R. Larus,et al.  Cache-conscious structure definition , 1999, PLDI '99.

[9]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[10]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[11]  Jean-Loup Baer,et al.  An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[12]  James R. Larus,et al.  Cache-conscious structure layout , 1999, PLDI '99.

[13]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[14]  Proceedings 2003 International Conference on Parallel Processing , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[15]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .

[16]  Andreas Moshovos,et al.  Dependence based prefetching for linked data structures , 1998, ASPLOS VIII.

[17]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[18]  Jang-Soo Lee,et al.  Design and evaluation of a selective compressed memory system , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[19]  Jun Yang,et al.  Frequent value compression in data caches , 2000, MICRO 33.