GraphS: A Graph Processing Accelerator Leveraging SOT-MRAM

In this work, we present GraphS architecture, which transforms current Spin Orbit Torque Magnetic Random Access Memory (SOT-MRAM) to massively parallel computational units capable of accelerating graph processing applications. GraphS can be leveraged to greatly reduce energy consumption dealing with underlying adjacency matrix computations, eliminating unnecessary off-chip accesses and providing ultra-high internal bandwidth. The device-to-architecture co-simulation for three social network data-sets indicate roughly 3.6× higher energy-efficiency and 5.3× speed-up over recent ReRAM crossbar. It achieves ∼4× higher energy-efficiency and 5.1× speed-up over recent processing-in-DRAM acceleration methods.

[1]  Cong Xu,et al.  Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[2]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[3]  Yuan Xie,et al.  DRISA: A DRAM-based Reconfigurable In-Situ Accelerator , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  David Blaauw,et al.  Compute Caches , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[5]  Shaahin Angizi,et al.  IMCE: Energy-efficient bit-wise in-memory convolution engine for deep neural network , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[6]  H. Kanaya,et al.  4Gbit density STT-MRAM using perpendicular MTJ realized with compact cell structure , 2016, 2016 IEEE International Electron Devices Meeting (IEDM).

[7]  David Blaauw,et al.  Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[8]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[9]  Yiran Chen,et al.  GraphR: Accelerating Graph Processing Using ReRAM , 2017, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[10]  Yu Wang,et al.  GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11]  Tajana Simunic,et al.  MPIM: Multi-purpose in-memory processing using configurable resistive memory , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[12]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[13]  Mostafa Rahimi Azghadi,et al.  A novel low-power full-adder cell with new technique in designing logical gates based on static CMOS inverter , 2009, Microelectron. J..

[14]  Ramyad Hadidi,et al.  GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[15]  Jure Leskovec,et al.  Learning to Discover Social Circles in Ego Networks , 2012, NIPS.

[16]  Jung Ho Ahn,et al.  CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[17]  Onur Mutlu,et al.  Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).