Modeling and Analyzing of 3D DRAM as L3 Cache Based on DRAMSim2

Cache memory system with a die-stacking DRAM L3 cache is a promising answer to break the Memory Wall and has a positive effect on performance. In order to further optimize the existing memory system, in this paper, a 3D DRAM as L3 Cache is modeled and analyzed based on DRAMSim2 simulator. In order to use an on-die DRAM as cache, tags and data are combined in one row in the DRAM, meanwhile, utilize the 3D DRAM with wider bus width and denser capacity. The cache memory modeling platform is evaluated by running traces which simulate the access behavior of core from spec2000 that generated by gem5. With DRAM L3 cache, all the test traces experience an improvement of performance. Read operation has an average speed-up of 1.82× over the baseline memory system, while write operation is 6.38×. The improvement of throughput in 3D DRAM cache compared to baseline system can reach to 1.45×’s speedup.

[1]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[2]  Jung Ho Ahn,et al.  CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[3]  Aamer Jaleel,et al.  CAMEO: A Two-Level Memory Organization with Capacity of Main Memory and Flexibility of Hardware-Managed Cache , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[4]  Mark D. Hill,et al.  Efficiently enabling conventional block sizes for very large die-stacked DRAM caches , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[6]  Yuan Xie,et al.  Design space exploration for 3D architectures , 2006, JETC.

[7]  Babak Falsafi,et al.  Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache , 2013, ISCA.

[8]  Matthew Poremba,et al.  NVMain: An Architectural-Level Main Memory Simulator for Emerging Non-volatile Memories , 2012, 2012 IEEE Computer Society Annual Symposium on VLSI.

[9]  Gabriel H. Loh,et al.  Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[10]  Lei Jiang,et al.  Die Stacking (3D) Microarchitecture , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[11]  Zhen Fang,et al.  Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[12]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[13]  Krisztián Flautner,et al.  PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor , 2006, ASPLOS XII.

[14]  Rajeev Balasubramonian,et al.  Leveraging 3D Technology for Improved Reliability , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[15]  Hsien-Hsin S. Lee,et al.  An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[16]  Bruce Jacob,et al.  DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.

[17]  Luca Benini,et al.  Energy optimization in 3D MPSoCs with Wide-I/O DRAM using temperature variation aware bank-wise refresh , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).