RAGra: Leveraging Monolithic 3D ReRAM for Massively-Parallel Graph Processing

With the maturity of monolithic 3D integration, 3D ReRAM provides impressive storage-density and computational-parallelism with great opportunities for parallel-graph processing acceleration. In this paper, we present RAGra, a 3D ReRAM-based graph processing accelerator, which has two significant technical highlights. First, monolithic 3D ReRAM usually has the complexly-intertwined feature with shared input wordlines and output bitlines for different layers. We propose novel mapping schemes, which can guide to apply different graph algorithms into 3D ReRAM seamlessly and correctly for exposing the inherently-irregular parallelism of 3D ReRAM. Second, consider the sparsity of real-world graphs, we further propose a row- and column-mixed execution model, which can filter invalid subgraphs for exploiting the massive parallelism of 3D ReRAM. Our evaluation on 8-layer stacked ReRAM shows that RAGra outperforms state-of-the-art planar (2D) ReRAM based graph accelerator GraphR by 6.18× performance improvement and 2.21 ×energy saving, on average. In particular, RAGra significantly outperforms Grid-Graph (a typical CPU-based graph system) by up to 293.12×.

[1]  Christoforos E. Kozyrakis,et al.  GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[2]  Margaret Martonosi,et al.  Graphicionado: A high-performance and energy-efficient accelerator for graph analytics , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[3]  Wei Zhang,et al.  3D-HIM: A 3D High-density Interleaved Memory for bipolar RRAM design , 2011, 2011 IEEE/ACM International Symposium on Nanoscale Architectures.

[4]  Y. Y. Lin,et al.  Multi-layer sidewall WOX resistive memory suitable for 3D ReRAM , 2012, 2012 Symposium on VLSI Technology (VLSIT).

[5]  Dong Li,et al.  DESTINY: A tool for modeling emerging 3D NVM and eDRAM caches , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[6]  Wenguang Chen,et al.  GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning , 2015, USENIX ATC.

[7]  Yiran Chen,et al.  GraphR: Accelerating Graph Processing Using ReRAM , 2017, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[8]  Pengcheng Yao,et al.  An efficient graph accelerator with parallel data conflict management , 2018, PACT.

[9]  Ozcan Ozturk,et al.  Energy Efficient Architecture for Graph Analytics Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[10]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[11]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[12]  Chaitali Chakrabarti,et al.  Design and Analysis of Energy-Efficient and Reliable 3-D ReRAM Cross-Point Array System , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.