ApproxPIM: Exploiting realistic 3D-stacked DRAM for energy-efficient processing in-memory

Processing-in-Memory (PIM), has recently been revisited as one of the most promising solutions to deal with the issue of bandwidth and power wall between processor and memory. In this paper, we propose a light-weight PIM architecture, approxPIM, which leverages approximate computing techniques to enable InMemory Processing in a realistic 3D-stacked DRAM, Micron's Hybrid Memory Cube (HMC). Using the newly-released atomic instruction support of the HMC, approxPIM can process a wide range of data-intensive applications without adding any logic resources into the memory devices. Furthermore, we propose to approximate those accuracy-insensitive applications with the limited functioning set of HMC commands so that they could be smoothly mapped to the HMCs without the inference from processors, therefore enabling energy-efficient Processing-in-Memory and greatly expanding the scope of target PIM applications with HMC. In general, approxPIM gives a comprehensive study on HMC's potential and weakness in the application of Processing-in-Memory. Evaluation results show that our approxPIM significantly boosts the energy-efficiency and performance of the whole system.

[1]  Jung Ho Ahn,et al.  Memory Network : Enabling Technology for Scalable Near-Data Computing , 2014 .

[2]  Steven Swanson,et al.  Near-Data Processing: Insights from a MICRO-46 Workshop , 2014, IEEE Micro.

[3]  Franz Franchetti,et al.  A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing , 2013, 2013 IEEE International 3D Systems Integration Conference (3DIC).

[4]  Shengsheng Huang,et al.  HiBench : A Representative and Comprehensive Hadoop Benchmark Suite , 2012 .

[5]  Nilay Khare,et al.  Parallelization of KMP String Matching Algorithm on Different SIMD architectures: Multi-Core and GPGPU’s , 2012 .

[6]  Pedro López,et al.  Multi2Sim: A Simulation Framework to Evaluate Multicore-Multithreaded Processors , 2007, 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07).

[7]  Thomas Krause,et al.  A 12.5Gb/s SerDes in 65nm CMOS Using a Baud-Rate ADC with Digital Receiver Equalization and Clock Recovery , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[8]  John Shalf,et al.  Let there be light!: the future of memory systems is photonics and 3D stacking , 2011, MSPC '11.

[9]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10]  Seth H. Pugsley Opportunities for near data computing in MapReduce workloads , 2015 .

[11]  Yong Chen,et al.  HMC-Sim: A Simulation Framework for Hybrid Memory Cube Devices , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[12]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[13]  Noah Treuhaft,et al.  Intelligent RAM (IRAM): the industrial setting, applications, and architectures , 1997, Proceedings International Conference on Computer Design VLSI in Computers and Processors.

[14]  Anand Raghunathan,et al.  Best-effort computing: Re-thinking parallel software and hardware , 2010, Design Automation Conference.

[15]  Mike Ignatowski,et al.  TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.

[16]  Jung Ho Ahn,et al.  CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).