Exploring new features of high-bandwidth memory for GPUs

Due to the off-chip I/O pin and power constraints of GDDR5, HBM has been proposed to provide higher bandwidth and lower power consumption for GPUs. In this paper, we first provide detailed comparison between HBM and GDDR5 and expose two unique features of HBM: dualcommand and pseudo channel mode. Second, we analyze the effectiveness of these two features and show that neither notably contributes to performance. However, by combining pseudo channel mode with cache architecture supporting fine-grained cache-line management such as Amoeba cache, we achieve high effciency for applications with irregular memory requests. Our experiment demonstrates that compared with Amoeba caches with legacy mode, Amoeba cache with pseudo channel mode improves GPU performance by 25% and reduces HBM energy consumption by 15%.

[1]  R. Govindarajan,et al.  Exploring hybrid memory for GPU energy efficiency through software-hardware co-design , 2013, PACT 2013.

[2]  Tao Zhang,et al.  Half-DRAM: A high-bandwidth and low-power DRAM architecture from the rethinking of fine-grained activation , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[3]  O Seongil,et al.  Microbank: Architecting Through-Silicon Interposer-Based Main Memory Systems , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[5]  Keshav Pingali,et al.  A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).

[6]  R. Govindarajan,et al.  Multiple sub-row buffers in DRAM: unlocking performance and energy improvement opportunities , 2012, ICS '12.

[7]  Christoforos E. Kozyrakis,et al.  Improving System Energy Efficiency with Memory Rank Subsetting , 2012, TACO.

[8]  Sandhya Dwarkadas,et al.  Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[9]  Jung Ho Ahn,et al.  Multicore DIMM: an Energy Efficient Memory Module with Independently Controlled DRAMs , 2009, IEEE Computer Architecture Letters.

[10]  Mattan Erez,et al.  A locality-aware memory hierarchy for energy-efficient GPU architectures , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  Hyeran Jeon,et al.  Graph processing on GPUs: Where are the bottlenecks? , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[12]  Lizy Kurian John,et al.  Minimalist open-page: A DRAM page-mode scheduling policy for the many-core era , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[13]  Luiz André Barroso,et al.  Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).