A comparative analysis of performance improvement schemes for cache memories

There have been numerous techniques proposed in the literature that aim to improve the performance of cache memories by reducing cache conflicts. These techniques were proposed over the past decade and each proposal independently claimed to reduce conflict misses. However, because the published results used different benchmarks and different experimental setups, it is not easy to compare them. In this paper we report a side-by-side comparison of these techniques. We also evaluate the suitability of some of these techniques for caches with higher set associativities. In addition to evaluating techniques for their impact on cache misses and average memory access times, we also evaluate the techniques for their ability in reducing the non-uniformity of cache accesses. The conclusion of our work is that, each application may benefit from a different technique and no single scheme works universally well for all applications. We also observe that, for the majority of applications, XORing (XOR) and Odd-multiplier indexing schemes perform reasonably well. Among programmable associativity techniques, B-cache performs better than column-associative and adaptive-caches, but column-associative caches require very minimal extensions to hardware. Uniformity of cache accesses is improved most by B-cache technique while column-associative cache also improves cache access uniformities. Based on the observation that different techniques benefit different applications, we explored the use of multiple, programmable addressing mechanisms, each addressing scheme designed for a specific application. We include some preliminary data using multiple addressing schemes.

[1]  Matthew L. Seidl,et al.  Segregating heap objects by reference behavior and lifetime , 1998, ASPLOS VIII.

[2]  Chuanjun Zhang Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[3]  Luca Benini,et al.  Reducing cache misses by application-specific re-configurable indexing , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..

[4]  Afrin Naz,et al.  Improving Uniformity of Cache Access Pattern using Split Data Caches , 2009, ISCA PDCCS.

[5]  Chandra Krintz,et al.  Cache-conscious data placement , 1998, ASPLOS VIII.

[6]  Hugo De Man,et al.  Cache conscious data layout organization for conflict miss reduction in embedded multimedia applications , 2005, IEEE Transactions on Computers.

[7]  Jih-Kwon Peir,et al.  Capturing dynamic memory reference behavior with adaptive cache topology , 1998, ASPLOS VIII.

[8]  Tony Givargis Improved indexing for cache miss reduction in embedded systems , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[9]  Alfred V. Aho,et al.  Principles of Compiler Design , 1977 .

[10]  John P. Hayes,et al.  On randomly interleaved memories , 1990, Proceedings SUPERCOMPUTING '90.

[11]  Lawrence Rauchwerger,et al.  Custom Memory Allocation for Free , 2006, LCPC.

[12]  L. Rauchwerger,et al.  How to Focus on Memory Allocation Strategies , 2007 .

[13]  Jaejin Lee,et al.  Using prime numbers for cache indexing to eliminate conflict misses , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[14]  Afrin Naz,et al.  Smaller Split L-1 Data Caches for Multi-core Processing Systems , 2009, 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks.

[15]  Kathryn S. McKinley,et al.  Reconsidering custom memory allocation , 2002, OOPSLA '02.

[16]  Chuanjun Zhang Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders , 2006 .

[17]  James R. Larus,et al.  Cache-conscious structure definition , 1999, PLDI '99.

[18]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[19]  Chau-Wen Tseng,et al.  Compiler optimizations for improving data locality , 1994, ASPLOS VI.

[20]  Anant Agarwal,et al.  Column-associative caches: a technique for reducing the miss rate of direct-mapped caches , 1993, ISCA '93.

[21]  Todd C. Mowry,et al.  Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.

[22]  Vikram S. Adve,et al.  Automatic pool allocation: improving performance by controlling data structure layout in the heap , 2005, PLDI '05.

[23]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.