Exploring the Processing-in-Memory design space

With the emergence of 3D-DRAM, Processing-in-Memory has once more become of great interest to the research community and industry. Here we present our observations on a subset of the PIM design space. We show how the architectural choices for PIM core frequency and cache sizes will affect the overall power consumption and energy efficiency. We include a detailed power consumption breakdown for an ARM-like core as a PIM core. We show the maximum possible number of PIM cores we can place in the logic layer with respect to a predefined power budget. Additionally, we catalog additional sources of power consumption in a system with PIM such as 3D-DRAM link power and discuss the possible power reduction techniques. We describe the shortcomings of using ARM-like cores for PIM and discuss other alternatives for the PIM cores. Finally, we explore the optimal design choices for the number of cores as a function of performance, utilization, and energy efficiency.

[1]  Babak Falsafi,et al.  A Case for Specialized Processors for Scale-Out Workloads , 2014, IEEE Micro.

[2]  Peter M. Kogge,et al.  EXECUBE-A New Architecture for Scaleable MPPs , 1994, 1994 International Conference on Parallel Processing Vol. 1.

[3]  Gabriel H. Loh,et al.  Thermal Feasibility of Die-Stacked Processing in Memory , 2014 .

[4]  Krishna M. Kavi,et al.  Improving Node-Level MapReduce Performance Using Processing-in-Memory Technologies , 2014, Euro-Par Workshops.

[5]  William J. Dally,et al.  GPUs and the Future of Parallel Computing , 2011, IEEE Micro.

[6]  Chun Chen,et al.  The architecture of the DIVA processing-in-memory chip , 2002, ICS '02.

[7]  Peter M. Kogge,et al.  The Case for Processing-in-Memory , 1997 .

[8]  Josep Torrellas,et al.  FlexRAM: Toward an advanced Intelligent Memory system: A retrospective paper , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[9]  Mike Ignatowski,et al.  A new perspective on processing-in-memory architecture design , 2013, MSPC '13.

[10]  Feifei Li,et al.  NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[11]  Stefanos Kaxiras,et al.  Introducing DVFS-Management in a Full-System Simulator , 2013, 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems.

[12]  Nam Sung Kim,et al.  Reevaluating the latency claims of 3D stacked memories , 2013, 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC).

[13]  Lei Jiang,et al.  Die Stacking (3D) Microarchitecture , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[14]  Mike Ignatowski,et al.  TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.

[15]  Guang R. Gao,et al.  Processing In Memory: Chips to Petaflops , 1997, ISCA 1997.

[16]  William J. Dally,et al.  Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.

[17]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[18]  Gabriel H. Loh Nuwan Jayasena Mark H. Oskin Mark Nutter Da Ignatowski A Processing-in-Memory Taxonomy and a Case for Studying Fixed-function PIM , 2013 .

[19]  Jung Ho Ahn,et al.  CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[20]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[21]  Krishna M. Kavi,et al.  Intelligent memory manager: Reducing cache pollution due to memory management functions , 2006, J. Syst. Archit..