Conflict Avoiding Caches Invite New Data Layout Optimizations

Cache performance can be seriously degraded by conflict misses, which occur when too many addresses in the working set are mapped to the same sets of the cache. Past research has investigated a myriad techniques to remove conflict misses. Software-driven optimizations (e.g. padding and blocking) reorganize data or statements in a program in order to improve locality. Hardware optimizations include hashed indexing, i.e. the set index is computed using a XOR-based hash function instead of the conventional modulo indexing. It has been repeatedly shown that both software and hardware optimizations can effectively remove conflict misses. We confirm on a set of numerical kernels that hashing removes most conflict misses and that it avoids unusually high miss rates, which occur for pathological data layouts. Second, we show that data layout optimizations such as intra-variable padding do not consistently outperform caches with hashing. Furthermore, these optimizations provide only marginal improvements for caches with hashing. Caches with hashed set index functions allow new data layout optimizations which are meaningless in modulo-indexed caches. This paper introduces base address optimization and shows that the number of conflict misses can be reduced by over 15% on average for numerical kernels. Optimizing the base address is at least as powerful as intra-variable padding and it removes additional misses when applied together with intra-variable padding or blocking.

[1]  Koen De Bosschere,et al.  Evaluation of the performance of polynomial set index functions , 2002, ISCA 2002.

[2]  Mateo Valero,et al.  Eliminating cache conflict misses through XOR-based placement functions , 1997, ICS '97.

[3]  Koen De Bosschere,et al.  XOR-based hash functions , 2005, IEEE Transactions on Computers.

[4]  Chau-Wen Tseng,et al.  Data transformations for eliminating conflict misses , 1998, PLDI.

[5]  Antonio González,et al.  Randomized Cache Placement for Eliminating Conflicts , 1999, IEEE Trans. Computers.

[6]  William Jalby,et al.  XOR-Schemes: A Flexible Data Organization in Parallel Memories , 1985, ICPP.

[7]  José González,et al.  The design and performance of a conflict-avoiding cache , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[8]  Josep Llosa,et al.  An accurate cost model for guiding data locality transformations , 2005, TOPL.

[9]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[10]  Robert Shaw,et al.  Randomization and Associativity in the Design of Placement-Insensitive Caches , 1993 .

[11]  André Seznec,et al.  A case for two-way skewed-associative caches , 1993, ISCA '93.

[12]  Michael E. Wolf,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[13]  Jaejin Lee,et al.  Using prime numbers for cache indexing to eliminate conflict misses , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[14]  B. Ramakrishna Rau,et al.  Pseudo-randomly interleaved memory , 1991, ISCA '91.

[15]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[16]  André Seznec A New Case for Skewed-Associativity , 1997 .

[17]  Olivier Temam,et al.  Cache interference phenomena , 1994, SIGMETRICS.

[18]  François Bodin,et al.  Skewed Associativity Improves Program Performance and Enhances Predictability , 1997, IEEE Trans. Computers.

[19]  Qing Yang,et al.  A novel cache design for vector processing , 1992, ISCA '92.

[20]  Chandra Krintz,et al.  Cache-conscious data placement , 1998, ASPLOS VIII.

[21]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.