Improving Node-Level MapReduce Performance Using Processing-in-Memory Technologies

Processing-in-Memory (PIM) is the concept of moving computation as close as possible to memory. This decreases the need for the movement of data between central processor and memory system, hence improves energy efficiency from the reduced memory traffic. In this paper we present our approach on how to embed processing cores in 3D-stacked memories, and evaluate the use of such a system for Big Data analytics. We present a simple server architecture, which employs several energy efficient PIM cores in multiple 3D-DRAM units where the server acts as a node of a cluster for Big Data analyses utilizing MapReduce programming framework. Our preliminary analyses show that on a single node up to 23% energy savings on the processing units can be achieved while reducing execution time by up to 8.8%. Additional energy savings can result from simplifying the system memory buses. We believe such energy efficient systems with PIM capability will become viable in the near future because of the potential to scale the memory wall.

[1]  John D. Owens,et al.  Multi-GPU MapReduce on GPU Clusters , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[2]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[3]  Lei Jiang,et al.  Die Stacking (3D) Microarchitecture , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[4]  Gabriel H. Loh Nuwan Jayasena Mark H. Oskin Mark Nutter Da Ignatowski A Processing-in-Memory Taxonomy and a Case for Studying Fixed-function PIM , 2013 .

[5]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[6]  Steven J. Plimpton,et al.  MapReduce in MPI for Large-scale graph algorithms , 2011, Parallel Comput..

[7]  Josep Torrellas,et al.  FlexRAM: Toward an advanced Intelligent Memory system: A retrospective paper , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[8]  Robert Morris,et al.  Optimizing MapReduce for Multicore Architectures , 2010 .

[9]  Chun Chen,et al.  The architecture of the DIVA processing-in-memory chip , 2002, ICS '02.

[10]  Mike Ignatowski,et al.  A new perspective on processing-in-memory architecture design , 2013, MSPC '13.

[11]  Feifei Li,et al.  NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[12]  Mike Ignatowski,et al.  TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.

[13]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[14]  Babak Falsafi,et al.  A Case for Specialized Processors for Scale-Out Workloads , 2014, IEEE Micro.

[15]  Justin Talbot,et al.  Phoenix++: modular MapReduce for shared-memory systems , 2011, MapReduce '11.

[16]  Krishna M. Kavi,et al.  Intelligent memory manager: Reducing cache pollution due to memory management functions , 2006, J. Syst. Archit..

[17]  Haibo Chen,et al.  Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling , 2013, TACO.