论文信息 - Performance Implications of Processing-in-Memory Designs on Data-Intensive Applications

Performance Implications of Processing-in-Memory Designs on Data-Intensive Applications

The popularity of data-intensive applications and recent hardware developments drive the re-emergence of processing-in-memory (PIM) after earlier explorations several decades ago. To introduce PIM into a system, we must answer a fundamental question: what computation logic should be included into PIM? In terms of computation complexity, PIM can be either relatively simple, fixedfunctional, or fully programmable. The choice of fixedfunctional PIM and programmable PIM has direct impact on performance. In this paper, we explore the performance implications of fixed-functional PIM and programmable PIM on three data-intensive benchmarks-including a real data-intensive application. Our results show that - with PIMs - we obtain 2.09x-91.4x speedup over no PIM cases. However, the fixed-functional PIM and programmable PIM perform differently across applications (with performance difference up to 90%). Our results show that neither fixed-functional PIM nor programmable PIM can perform optimally in all cases. We must decide the usage of PIM based on the characteristics of the workload and PIM (e.g., instruction-level parallelism), and the PIM overhead (e.g., PIM initialization and synchronization overhead).

[1] Tack-Don Han,et al. An effective memory-processor integrated architecture for computer vision , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).

[2] Zvika Guz. Real-Time Analytics as the Killer Application for Processing-In-Memory , 2014 .

[3] Peter M. Kogge,et al. EXECUBE-A New Architecture for Scaleable MPPs , 1994, 1994 International Conference on Parallel Processing Vol. 1.

[4] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[5] Tze Meng Low,et al. 3 D-Stacked Memory-Side Acceleration : Accelerator and System Design , 2014 .

[6] Franz Franchetti,et al. Data reorganization in memory using 3D-stacked DRAM , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[7] Jose Renau,et al. Programming the FlexRAM parallel intelligent memory system , 2003, PPoPP '03.

[8] Mike Ignatowski,et al. High-level Programming Model Abstractions for Processing in Memory , 2013 .

[9] Florin Rusu,et al. Scalable Analytics Model Calibration with Online Aggregation , 2015, IEEE Data Eng. Bull..

[10] Florin Rusu,et al. Scalable I/O-bound parallel incremental gradient descent for big data analytics in GLADE , 2013, DanaC '13.

[11] Kiyoung Choi,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[12] Josep Torrellas,et al. Automatic Code Mapping on an Intelligent Memory Architecture , 2001, IEEE Trans. Computers.

[13] Jaewook Shin,et al. Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[14] Hyesoon Kim,et al. Instruction Offloading with HMC 2.0 Standard: A Case Study for Graph Traversals , 2015, MEMSYS.

[15] Tong Wen. Introduction to the X 10 Implementation of NPB MG , 2006 .

[16] Gabriel H. Loh Nuwan Jayasena Mark H. Oskin Mark Nutter Da Ignatowski. A Processing-in-Memory Taxonomy and a Case for Studying Fixed-function PIM , 2013 .

[17] Florin Rusu,et al. Speculative Approximations for Terascale Distributed Gradient Descent Optimization , 2015, DanaC@SIGMOD.

[18] Dean M. Tullsen,et al. Data-triggered Multithreading for Near-Data Processing , 2013 .