Exploiting GPU with 3D Stacked Memory to Boost Performance for Data-Intensive Applications

An increasing number of applications are using GPUs for acceleration. Due to the massive number of memory accesses, the traditional DRAM becomes a bandwidth bottleneck. The 3D stacked memory gives the potential to alleviate the bandwidth bottleneck by using through silicon vias (TSVs) to deliver much higher on-chip bus width than the traditional off-chip interface. In this paper, we evaluate the latency and bandwidth benefits of 3D stacked memory on GPUs. In addition, we take advantage of the DRAM row buffer locality to merge memory requests to further improve the performance.

[1]  Martin Burtscher,et al.  Bridging the processor-memory performance gap with 3D IC technology , 2005, IEEE Design & Test of Computers.

[2]  Hsien-Hsin S. Lee,et al.  An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[3]  Jaejin Lee,et al.  25.2 A 1.2V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[4]  Alberto Cano,et al.  Blocking Self-Avoiding Walks Stops Cyber-Epidemics: A Scalable GPU-Based Approach , 2017, IEEE Transactions on Knowledge and Data Engineering.

[5]  Gabriel H. Loh,et al.  3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.

[6]  Onur Mutlu,et al.  Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation , 2016, 2016 IEEE 34th International Conference on Computer Design (ICCD).

[7]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[8]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[9]  Onur Mutlu,et al.  Simultaneous Multi-Layer Access , 2016, ACM Trans. Archit. Code Optim..