Analyzing the optimal ratio of SRAM banks in hybrid caches

Cache memories have been typically implemented with Static Random Access Memory (SRAM) technology. This technology presents a fast access time but high energy consumption and low density. As opposite, the recently appeared embedded Dynamic RAM (eDRAM) technology allows caches to be built with lower energy and area, although with a slower access time. The eDRAM technology provides important leakage and area savings, especially in huge Last-Level Caches (LLCs), which occupy almost half the silicon area in some recent microprocessors. This paper proposes a novel hybrid LLC, which combines SRAM and eDRAM banks to address the trade-off among performance, energy, and area. To this end, we explore the optimal percentage of SRAM and eDRAM banks that achieves the best target trade-off. Architectural mechanisms have been devised to keep the most likely accessed blocks in fast SRAM banks as well as to avoid unnecessary destructive reads. Experimental results show that, compared to a conventional SRAM LLC with the same storage capacity, performance degradation does not surpass, on average, 2.9% (even with 12.5% of banks built with SRAM technology), whereas area savings can be as high as 46% for a 1MB-16way LLC. For a 45nm technology node, the energy-delay squared product confirms that a hybrid cache is a better design than the conventional SRAM cache regardless the number of eDRAM banks, and also better than a conventional eDRAM cache when the number of SRAM banks is a quarter or an eighth of the cache banks.

[1]  Feng Lin,et al.  DRAM Circuit Design: Fundamental and High-Speed Topics , 2007 .

[2]  Pedro López,et al.  An hybrid eDRAM/SRAM macrocell to implement first-level data caches , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[3]  Pedro López,et al.  Impact on Performance and Energy of the Retention Time and Processor Frequency in L1 Macrocell-Based Data Caches , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Shyamkumar Thoziyoor,et al.  CACTI 5 . 1 , 2008 .

[5]  Balaram Sinharoy,et al.  POWER5 system microarchitecture , 2005, IBM J. Res. Dev..

[6]  Balaram Sinharoy,et al.  POWER4 system microarchitecture , 2002, IBM J. Res. Dev..

[7]  Eric M. Schwarz,et al.  IBM POWER6 microarchitecture , 2007, IBM J. Res. Dev..

[8]  David R. Kaeli,et al.  Exploiting temporal locality in drowsy cache policies , 2005, CF '05.

[9]  Xiaoxia Wu,et al.  Hybrid cache architecture with disparate memory technologies , 2009, ISCA '09.

[10]  Richard E. Matick,et al.  Logic-based eDRAM: Origins and rationale for use , 2005, IBM J. Res. Dev..

[11]  Yan Solihin,et al.  CHOP: Adaptive filter-based DRAM caching for CMP server platforms , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[12]  David M. Brooks,et al.  Implementing a hybrid SRAM / eDRAM NUCA architecture , 2011, 2011 18th International Conference on High Performance Computing.

[13]  M. Wordeman,et al.  An 800-MHz embedded DRAM with a concurrent refresh mode , 2005, IEEE Journal of Solid-State Circuits.

[14]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[15]  John Wilkes,et al.  Hewlett-Packard Laboratories , 1988 .

[16]  Balaram Sinharoy,et al.  IBM POWER7 multicore server processor , 2011 .

[17]  Pedro López,et al.  Design, Performance, and Energy Consumption of eDRAM/SRAM Macrocells for L1 Data Caches , 2012, IEEE Transactions on Computers.

[18]  G. Edward Suh,et al.  SRAM-DRAM hybrid memory with applications to efficient register files in fine-grained multi-threading , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).