A Processing-in-Memory Architecture Programming Paradigm for Wireless Internet-of-Things Applications

The widespread applications of the wireless Internet of Things (IoT) is one of the leading factors in the emerging of Big Data. Huge amounts of data need to be transferred and processed. The bandwidth and latency of data transfers have posed a new challenge for traditional computing systems. Under Big Data application scenarios, the movement of large scales of data would influence performance, power efficiency, and reliability, which are the three fundamental attributes of a computing system. Thus, changes in the computing paradigm are demanding. Processing-in- Memory (PIM), aiming at placing computation as close as possible to memory, has become of great interest to academia as well as industries. In this work, we propose a programming paradigm for PIM architecture that is suitable for wireless IoT applications. A data-transferring mechanism and middleware architecture are presented. We present our methods and experiences on simulation-platform design, as well as FPGA demo design, for PIM architecture. Typical applications in IoT, such as multimedia and MapReduce programs, are used as demonstration of our method’s validity and efficiency. The programs could successfully run on the simulation platform built based on Gem5 and on the FPGA demo. Results show that our method could largely reduce power consumption and execution time for those programs, which is very beneficial in IoT applications.

[1]  Dean M. Tullsen,et al.  Data-triggered Multithreading for Near-Data Processing , 2013 .

[2]  Mike Ignatowski,et al.  High-level Programming Model Abstractions for Processing in Memory , 2013 .

[3]  Katherine Yelick,et al.  A Case for Intelligent RAM: IRAM , 1997 .

[4]  J. Jeddeloh,et al.  Hybrid memory cube new DRAM architecture increases density and performance , 2012, 2012 Symposium on VLSI Technology (VLSIT).

[5]  Krishna M. Kavi,et al.  Processing-in-Memory: Exploring the Design Space , 2015, ARCS.

[6]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[7]  Peter M. Kogge,et al.  EXECUBE-A New Architecture for Scaleable MPPs , 1994, 1994 International Conference on Parallel Processing Vol. 1.

[8]  Marvin H. Eng,et al.  System Design for a Computational-RAM Logic-In-Memory Parallel-Processing Machine , 1999 .

[9]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[10]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  William J. Dally,et al.  Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.

[12]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[13]  Krishna M. Kavi,et al.  Improving Node-Level MapReduce Performance Using Processing-in-Memory Technologies , 2014, Euro-Par Workshops.

[14]  Keith Kim,et al.  HBM (High Bandwidth Memory) DRAM Technology and Architecture , 2017, 2017 IEEE International Memory Workshop (IMW).

[15]  K. Yelick,et al.  Intelligent RAM (IRAM): chips that remember and compute , 1997, 1997 IEEE International Solids-State Circuits Conference. Digest of Technical Papers.

[16]  Jaejin Lee,et al.  A 1.2 V 8 Gb 8-Channel 128 GB/s High-Bandwidth Memory (HBM) Stacked DRAM With Effective I/O Test Circuits , 2015, IEEE Journal of Solid-State Circuits.

[17]  Chun Chen,et al.  The architecture of the DIVA processing-in-memory chip , 2002, ICS '02.

[18]  Josep Torrellas,et al.  FlexRAM: Toward an advanced Intelligent Memory system: A retrospective paper , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[19]  Mike Ignatowski,et al.  A new perspective on processing-in-memory architecture design , 2013, MSPC '13.

[20]  Feifei Li,et al.  NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[21]  Steven Swanson,et al.  Near-Data Processing: Insights from a MICRO-46 Workshop , 2014, IEEE Micro.

[22]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[23]  Ran Ginosar,et al.  GP-SIMD Processing-in-Memory , 2015, ACM Trans. Archit. Code Optim..

[24]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[25]  Matthias S. Müller,et al.  Characterizing the energy consumption of data transfers and arithmetic operations on x86−64 processors , 2010, International Conference on Green Computing.

[26]  Krishna M. Kavi,et al.  Intelligent memory manager: Reducing cache pollution due to memory management functions , 2006, J. Syst. Archit..

[27]  Babak Falsafi,et al.  A Case for Specialized Processors for Scale-Out Workloads , 2014, IEEE Micro.

[28]  Mike Ignatowski,et al.  TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.