Thermal Feasibility of Die-Stacked Processing in Memory

Processing in memory (PIM) implemented via 3D die stacking has been recently proposed to reduce the widening gap between processor and memory performance. By moving computation that demands high memory bandwidth to the base logic die of a 3D memory stack, PIM promises significant improvements in energy efficiency. However, the vision of PIM implemented via 3D die stacking could potentially be derailed if the processor(s) raise the stack’s temperature to unacceptable levels. In this paper, we study the thermal constraints for PIM across different processor organizations and cooling solutions and show the range of designs that are viable under different conditions. We also demonstrate that PIM is feasible even with low-end, fanless cooling solutions. We believe these results help alleviate PIM thermal feasibility concerns and identify viable design points, thereby encouraging further exploration and research in novel PIM architectures, technologies, and use cases.

[1]  Frederic T. Chong,et al.  Active pages: a computation model for intelligent memory , 1998, ISCA.

[2]  Leonid Oliker,et al.  Memory-intensive benchmarks: IRAM vs. cache-based machines , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[3]  Chun Chen,et al.  The architecture of the DIVA processing-in-memory chip , 2002, ICS '02.

[4]  Karthick Rajamani,et al.  Energy Management for Commercial Servers , 2003, Computer.

[5]  Kevin Skadron,et al.  Temperature-aware microarchitecture , 2003, ISCA '03.

[6]  Kevin Skadron,et al.  CMP design space exploration subject to physical constraints , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[7]  Gabriel H. Loh,et al.  Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[8]  Gabriel H. Loh,et al.  3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.

[9]  Thomas Vogelsang,et al.  Understanding the Energy Consumption of Dynamic Random Access Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[10]  Norman P. Jouppi,et al.  Rethinking DRAM design and organization for energy-constrained multi-cores , 2010, ISCA.

[11]  Young-Hyun Jun,et al.  A 1.2V 12.8GB/s 2Gb mobile Wide-I/O DRAM with 4×128 I/Os using TSV-based stacking , 2011, 2011 IEEE International Solid-State Circuits Conference.

[12]  Richard Veras,et al.  RAIDR: Retention-aware intelligent DRAM refresh , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[13]  Seung-Moon Yoo,et al.  FlexRAM: Toward an advanced Intelligent Memory system , 1999, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[14]  B. Falsafi,et al.  Thermal Characterization of Cloud Workloads on a Low-power Server-on-Chip , 2012 .

[15]  Xuefei Han,et al.  Energy reduction in server cooling via real time thermal control , 2012, 2012 28th Annual IEEE Semiconductor Thermal Measurement and Management Symposium (SEMI-THERM).

[16]  Teja Singh,et al.  Jaguar: A next-generation low-power x86-64 core , 2013, 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers.

[17]  Krishna M. Kavi,et al.  Improving Node-Level MapReduce Performance Using Processing-in-Memory Technologies , 2014, Euro-Par Workshops.

[18]  Feifei Li,et al.  NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[19]  Pradip Bose,et al.  3D stacking of high-performance processors , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[20]  Mike Ignatowski,et al.  TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.