论文信息 - ReDRAM: A Reconfigurable DRAM Cache for GPGPUs

ReDRAM: A Reconfigurable DRAM Cache for GPGPUs

Hardware-based DRAM cache techniques for GPGPUs propose to use GPU DRAM as a cache of the host (system) memory. However, these approaches do not exploit the opportunity of allocating store-before-load data (data that is written before being read by GPU cores) on GPU DRAM that would save multiple CPU-GPU transactions. In this context, we propose ReDRAM, a novel memory allocation strategy for GPGPUs which re-configures GPU DRAM cache as a heterogeneous unit. It allows allocation of store-before-load data directly onto GPU DRAM and also utilizes it as a cache of the host memory. Our simulation results using a modified version of GPGPU-Sim show that ReDRAM can improve performance for applications that use store-before-load data by 57.6 percent (avg.) and 4.85x (max.) when compared to the existing proposals on state-of-the-art GPU DRAM caches.

[1] 김장우,et al. A fully associative, tagless DRAM cache , 2015 .

[2] Jaewon Lee,et al. GPUdmm: A high-performance and memory-oblivious GPU architecture using dynamic memory management , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[3] Jeffrey S. Vetter,et al. A Survey Of Techniques for Architecting DRAM Caches , 2016, IEEE Transactions on Parallel and Distributed Systems.

[4] Laxmi N. Bhuyan,et al. CAMO: A novel cache management organization for GPGPUs , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[5] David W. Nellans,et al. Towards high performance paged memory for GPUs , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[6] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[7] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[8] Jaewon Lee,et al. ScaleGPU: GPU Architecture for Memory-Unaware GPU Programming , 2014, IEEE Computer Architecture Letters.

[9] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[10] Mark D. Hill,et al. Efficiently enabling conventional block sizes for very large die-stacked DRAM caches , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11] Babak Falsafi,et al. Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.