ERUCA: Efficient DRAM Resource Utilization and Resource Conflict Avoidance for Memory System Parallelism

Memory system performance is measured by access latency and bandwidth, and DRAM access parallelism critically impacts for both. To improve DRAM parallelism, previous research focused on increasing the number of effective banks by sub-dividing one physical bank. We find that without avoiding conflicts on the shared resources among (sub)banks, the benefits are limited. We propose mechanisms for efficient DRAM resource utilization and resource-conflict avoidance (ERUCA). ERUCA reduces conflicts on shared (sub)bank resources utilizing row address locality between sub-banks and improving the DRAM chip-level data bus. Area overhead for ERUCA is kept near zero with a unique implementation that exploits under-utilized resources available in commercial DRAM chips. Overall ERUCA provides 15% speedup while incurring <0.3% DRAM die area overhead.

[1]  Seth H. Pugsley,et al.  USIMM : the Utah SImulated Memory Module , 2012 .

[2]  Eduardo Pinheiro,et al.  DRAM errors in the wild: a large-scale field study , 2009, SIGMETRICS '09.

[3]  Timothy J. Dell,et al.  A white paper on the benefits of chipkill-correct ecc for pc server main memory , 1997 .

[4]  M. Horiguchi,et al.  Redundancy techniques for high-density DRAMs , 1997, 1997 Proceedings Second Annual IEEE International Conference on Innovative Systems in Silicon.

[5]  Thomas Vogelsang,et al.  Understanding the Energy Consumption of Dynamic Random Access Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[6]  Zhao Zhang,et al.  A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality , 2000, MICRO 33.

[7]  Feng Lin,et al.  DRAM Circuit Design: Fundamental and High-Speed Topics , 2007 .

[8]  Onur Mutlu,et al.  A case for exploiting subarray-level parallelism (SALP) in DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[9]  宋清基 Memory and memory module including the same , 2014 .

[10]  Lizy Kurian John,et al.  Minimalist open-page: A DRAM page-mode scheduling policy for the many-core era , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  Jung Ho Ahn,et al.  CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[12]  Stefan Mangard,et al.  Reverse Engineering Intel DRAM Addressing and Exploitation , 2015, ArXiv.

[13]  Dae-Hyun Kim,et al.  ArchShield: architectural framework for assisting DRAM scaling by tolerating high error rates , 2013, ISCA.

[14]  Michael M. Swift,et al.  Efficient virtual memory for big memory servers , 2013, ISCA.

[15]  Yunsaing Kim,et al.  A 1.2V 38nm 2.4Gb/s/pin 2Gb DDR4 SDRAM with bank group and ×4 half-page architecture , 2012, 2012 IEEE International Solid-State Circuits Conference.

[16]  William J. Dally,et al.  Architecting an Energy-Efficient DRAM System for GPUs , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[17]  Jongmoo Choi,et al.  Decoupled Direct Memory Access: Isolating CPU and IO Traffic by Leveraging a Dual-Data-Port DRAM , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[18]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[19]  O Seongil,et al.  Microbank: Architecting Through-Silicon Interposer-Based Main Memory Systems , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[20]  Dean M. Tullsen,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[21]  Tao Zhang,et al.  Half-DRAM: A high-bandwidth and low-power DRAM architecture from the rethinking of fine-grained activation , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[22]  Jae-Hyung Lee,et al.  A 7 Gb/s/pin 1 Gbit GDDR5 SDRAM With 2.5 ns Bank to Bank Active Time and No Bank Group Restriction , 2011, IEEE Journal of Solid-State Circuits.

[23]  Masashi Horiguchi,et al.  A flexible redundancy technique for high-density DRAMs , 1991 .

[24]  Youngjin Kwon,et al.  Coordinated and Efficient Huge Page Management with Ingens , 2016, OSDI.

[25]  Mark Horowitz,et al.  Improving energy efficiency of DRAM by exploiting half page row access , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[26]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[27]  Bruce Jacob,et al.  Fine-Grained Activation for Power Reduction in DRAM , 2010, IEEE Micro.

[28]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).