A novel ReRAM-based processing-in-memory architecture for graph computing

Graph algorithms such as breadth-first search (BFS) have been gaining ever-increasing importance in the era of Big Data. However, the memory bandwidth remains the key performance bottleneck for graph processing. To address this problem, we utilize processing-in-memory (PIM), combined with non-volatile metal-oxide resistive random access memory (ReRAM), to improve the performance of both computation and I/O. The idea is to integrate the computation logic into the memory in which the data accesses are located. We propose and implement a new ReRAM-based processing-in-memory architecture called RPBFS, in which graphs can be processed and persistently stored. We also design an efficient graph traversal scheme. Benefited from low data movement overhead and bank-level parallel computation, RPBFS shows a significant performance improvement compared with both the CPU-based and GPU-based BFS implementations. On a suite of real world graphs, our architecture yields up to 33.8× speedup.

[1]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[2]  Tao Zhang,et al.  Overcoming the challenges of crossbar resistive memory architectures , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[3]  P. J. Narayanan,et al.  Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.

[4]  Ozcan Ozturk,et al.  Energy Efficient Architecture for Graph Analytics Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[5]  Tao Zhang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[6]  David A. Patterson,et al.  Direction-optimizing breadth-first search , 2012, HiPC 2012.

[7]  H. Howie Huang,et al.  Enterprise: breadth-first graph traversal on GPUs , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Martin D. F. Wong,et al.  An effective GPU implementation of breadth-first search , 2010, Design Automation Conference.

[9]  David A. Patterson,et al.  Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server , 2015, 2015 IEEE International Symposium on Workload Characterization.

[10]  Ligang Gao,et al.  High precision tuning of state for memristive devices by adaptable variation-tolerant algorithm , 2011, Nanotechnology.

[11]  Ching-Yung Lin,et al.  GraphBIG: understanding graph computing in the context of industrial solutions , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[13]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.