The Case for Associative DRAM Caches

In-package DRAM caches are a promising new development that may enable the continued scaling of main memory by facilitating the creation of multi-level memory systems that can effectively utilize dense non-volatile memory technologies. However, determining an appropriate storage scheme for the large amount of meta-data needed by these new caches has proven to be difficult. As a result, prior work has suggested that associativity, with its additional metadata requirements, may not be well suited for use in large in-package DRAM caches. This work makes the case that despite these problems, associativity is still a desirable feature for DRAM caches by demonstrating the benefits of associativity for a wide range of cache configurations and workloads.

[1]  Gabriel H. Loh,et al.  Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[2]  Babak Falsafi,et al.  Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache , 2013, ISCA.

[3]  Jinkyu Jeong,et al.  A fully associative, tagless DRAM cache , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[4]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[5]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[6]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[7]  Onur Mutlu,et al.  Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management , 2012, IEEE Computer Architecture Letters.

[8]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[9]  Cheng-Chieh Huang,et al.  ATCache: Reducing DRAM cache latency via a small SRAM tag cache , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[10]  Rami G. Melhem,et al.  Using PCM in Next-generation Embedded Space Applications , 2010, 2010 16th IEEE Real-Time and Embedded Technology and Applications Symposium.

[11]  Michael Wu,et al.  eNVy: a non-volatile, main memory storage system , 1994, ASPLOS VI.

[12]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[13]  Mark D. Hill,et al.  Efficiently enabling conventional block sizes for very large die-stacked DRAM caches , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[14]  B. Jacob,et al.  AN INTEgRATED SIMulATIoN INfRASTRuCTuRE foR THE ENTIRE MEMoRy HIERARCHy: CACHE, DRAM, NoNVolATIlE MEMoRy, AND DISk , 2013 .

[15]  Mikko H. Lipasti,et al.  Tag tables , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[16]  Jörg Henkel,et al.  Simultaneously optimizing DRAM cache hit latency and miss rate via novel set mapping policies , 2013, 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[17]  Shunfei Chen,et al.  MARSS: A full system simulator for multicore x86 CPUs , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[18]  Babak Falsafi,et al.  Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[19]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.