Adaptive Bandwidth Management for Performance-Temperature Trade-offs in Heterogeneous HMC+DDRx Memory

High fabrication cost per bit and thermal issues are the main reasons that prevent architects from using 3D-DRAM alone as the main memory. In this paper we address this issue by proposing a heterogeneous memory system that combines a DDRx DRAM with an emerging 3D hybrid memory cube (HMC) technology. Bandwidth and temperature management are the challenging issues for such heterogeneous memory architecture. To address these challenges, first we introduce a memory page allocation policy for the heterogeneous memory system to maximize performance. Then, using the proposed memory page allocation policy, we propose a temperature-aware algorithm that adaptively distributes the requested bandwidth between HMC and DDRx DRAM to reduce the thermal hotspot while maintaining high performance. The results show that the proposed memory page allocation policy can utilize the memory bandwidth close to 99% of the ideal bandwidth utilization. Moreover our temperate-aware bandwidth adaptation reduces the average steady-state temperature of the HMC hotspot across various workloads by 4.5oK while incurring 2.5% performance overhead.

[1]  Kevin Skadron,et al.  Temperature-aware microarchitecture , 2003, ISCA '03.

[2]  Houman Homayoun,et al.  Enabling dynamic heterogeneity through core-on-core stacking , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[3]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Dean M. Tullsen,et al.  Symbiotic jobscheduling with priorities for a simultaneous multithreading processor , 2002, SIGMETRICS '02.

[5]  Jie Meng,et al.  Optimizing energy efficiency of 3-D multicore systems with stacked DRAM under power and thermal constraints , 2012, DAC Design Automation Conference 2012.

[6]  Mor Harchol-Balter,et al.  ATLAS : A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers , 2010 .

[7]  Young-Hyun Jun,et al.  8 Gb 3-D DDR3 DRAM Using Through-Silicon-Via Technology , 2009, IEEE Journal of Solid-State Circuits.

[8]  Kenneth Rose,et al.  Impacts of though-DRAM vias in 3D processor-DRAM integrated systems , 2009, 2009 IEEE International Conference on 3D System Integration.

[9]  Gabriel H. Loh,et al.  Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[10]  Houman Homayoun,et al.  Temperature aware thread migration in 3D architecture with stacked DRAM , 2013, International Symposium on Quality Electronic Design (ISQED).

[11]  Dean M. Tullsen,et al.  Fellowship - Simulation And Modeling Of A Simultaneous Multithreading Processor , 1996, Int. CMG Conference.

[12]  Anne Rogers,et al.  Supporting dynamic data structures on distributed-memory machines , 1995, TOPL.

[13]  J. Jeddeloh,et al.  Hybrid memory cube new DRAM architecture increases density and performance , 2012, 2012 Symposium on VLSI Technology (VLSIT).

[14]  Yuan Xie,et al.  Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  Bruce Jacob,et al.  DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.

[16]  Mikko H. Lipasti,et al.  Data compression for thermal mitigation in the Hybrid Memory Cube , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[17]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[18]  Houman Homayoun,et al.  Heterogeneous memory management for 3D-DRAM and external DRAM with QoS , 2013, 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC).

[19]  Sarita V. Adve,et al.  AS SCALING THREATENS TO ERODE RELIABILITY STANDARDS, LIFETIME RELIABILITY MUST BECOME A FIRST-CLASS DESIGN CONSTRAINT. MICROARCHITECTURAL INTERVENTION OFFERS A NOVEL WAY TO MANAGE LIFETIME RELIABILITY WITHOUT SIGNIFICANTLY SACRIFICING COST AND PERFORMANCE , 2005 .

[20]  J. Thomas Pawlowski,et al.  Hybrid memory cube (HMC) , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).

[21]  Giovanni De Micheli,et al.  Temperature-aware runtime power management for chip-multiprocessors with 3-D stacked cache , 2014, Fifteenth International Symposium on Quality Electronic Design.