ReDRAM: A Reconfigurable DRAM Cache for GPGPUs

Hardware-based DRAM cache techniques for GPGPUs propose to use GPU DRAM as a cache of the host (system) memory. However, these approaches do not exploit the opportunity of allocating store-before-load data (data that is written before being read by GPU cores) on GPU DRAM that would save multiple CPU-GPU transactions. In this context, we propose ReDRAM, a novel memory allocation strategy for GPGPUs which re-configures GPU DRAM cache as a heterogeneous unit. It allows allocation of store-before-load data directly onto GPU DRAM and also utilizes it as a cache of the host memory. Our simulation results using a modified version of GPGPU-Sim show that ReDRAM can improve performance for applications that use store-before-load data by 57.6 percent (avg.) and 4.85x (max.) when compared to the existing proposals on state-of-the-art GPU DRAM caches.

[1]  김장우,et al.  A fully associative, tagless DRAM cache , 2015 .

[2]  Jaewon Lee,et al.  GPUdmm: A high-performance and memory-oblivious GPU architecture using dynamic memory management , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[3]  Jeffrey S. Vetter,et al.  A Survey Of Techniques for Architecting DRAM Caches , 2016, IEEE Transactions on Parallel and Distributed Systems.

[4]  Laxmi N. Bhuyan,et al.  CAMO: A novel cache management organization for GPGPUs , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[5]  David W. Nellans,et al.  Towards high performance paged memory for GPUs , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[6]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[7]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[8]  Jaewon Lee,et al.  ScaleGPU: GPU Architecture for Memory-Unaware GPU Programming , 2014, IEEE Computer Architecture Letters.

[9]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[10]  Mark D. Hill,et al.  Efficiently enabling conventional block sizes for very large die-stacked DRAM caches , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  Babak Falsafi,et al.  Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.