A Locality-Aware Energy-Efficient Accelerator for Graph Mining Applications

Graph mining is becoming increasingly important due to the ever-increasing demands on analyzing complex structures in graphs. Existing graph accelerators typically hold most of the randomly-accessed data in an on-chip memory to avoid off-chip communications. However, graph mining exhibits substantial random accesses from not only vertex dimension but also edge dimension (with the latter being excessively more complex than the former), leading to significant degradations in terms of both performance and energy efficiency.We observe that the most random memory requests arising in graph mining come from accessing a small fraction of valuable (vertex and edge) data when handling real-world graphs. To exploit this extension locality with maximum parallelism, we architect GRAMER, the first graph mining accelerator. GRAMER contains a specialized memory hierarchy, where the valuable data (precisely identified through a cost-efficient heuristic) is permanently resident in a high-priority memory while others are maintained in a cache-like memory under a lightweight replacement policy. The specific pipelined processing units are carefully designed to maximize computational parallelism. GRAMER is also equipped with a work-stealing mechanism to reduce load imbalance. We have implemented GRAMER on a Xilinx Alveo U250 accelerator card. Compared with two state-of-the-art CPU-based graph mining systems, Fractal and RStream, running on a 14-core Intel E5-2680 v4 processor, GRAMER achieves not only considerable speedups (1.11 × ∼ 129.95 ) but also significant energy savings (5.79 × ∼ 678.34×)

[1]  Sachin Katti,et al.  Cliffhanger: Scaling Performance Cliffs in Web Memory Caches , 2016, NSDI.

[2]  Yu Wang,et al.  ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture , 2017, FPGA.

[3]  Bingsheng He,et al.  Fast Subgraph Matching on Large Graphs using Graphics Processors , 2015, DASFAA.

[4]  Sourav S. Bhowmick,et al.  DUALSIM: Parallel Subgraph Enumeration in a Massive Graph on a Single Machine , 2016, SIGMOD Conference.

[5]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[6]  Fernando M. A. Silva,et al.  G-Tries: a data structure for storing and finding subgraphs , 2014, Data Mining and Knowledge Discovery.

[7]  Wenguang Chen,et al.  Gemini: A Computation-Centric Distributed Graph Processing System , 2016, OSDI.

[8]  Irfan Ahmad,et al.  Cache Modeling and Optimization using Miniature Simulations , 2017, USENIX Annual Technical Conference.

[9]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[10]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[11]  Benno Schwikowski,et al.  Graph-based methods for analysing networks in cell biology , 2006, Briefings Bioinform..

[12]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[13]  Charu C. Aggarwal,et al.  Managing and Mining Graph Data , 2010, Managing and Mining Graph Data.

[14]  Song Jiang,et al.  LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance , 2002, SIGMETRICS '02.

[15]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[16]  Srinivasan Parthasarathy,et al.  Fractal: A General-Purpose Graph Pattern Mining System , 2019, SIGMOD Conference.

[17]  David A. Patterson,et al.  Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server , 2015, 2015 IEEE International Symposium on Workload Characterization.

[18]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[19]  Dennis Shasha,et al.  2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm , 1994, VLDB.

[20]  Li Zhao,et al.  Analysis and Optimization of the Memory Hierarchy for Graph Processing Workloads , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[21]  Wenguang Chen,et al.  GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning , 2015, USENIX ATC.

[22]  John C. S. Lui,et al.  G-thinker: A Distributed Framework for Mining Subgraphs in a Big Graph , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[23]  Jonathan W. Berry,et al.  Challenges in Parallel Graph Processing , 2007, Parallel Process. Lett..

[24]  Kai Wang,et al.  RStream: Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine , 2018, OSDI.

[25]  Pengcheng Yao,et al.  An efficient graph accelerator with parallel data conflict management , 2018, PACT.

[26]  Shreesha Srinath,et al.  An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[27]  Xing Xie,et al.  Network Motif Discovery: A GPU Approach , 2015, IEEE Transactions on Knowledge and Data Engineering.

[28]  Tianshi Chen,et al.  TuNao: A High-Performance and Energy-Efficient Reconfigurable Accelerator for Graph Processing , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[29]  Daniel Sánchez,et al.  Talus: A simple way to remove cliffs in cache performance , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[30]  Panos Kalnis,et al.  ScaleMine: Scalable Parallel Frequent Subgraph Mining in a Single Large Graph , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[31]  Jimmy J. Lin,et al.  NScale: neighborhood-centric large-scale graph analytics in the cloud , 2014, The VLDB Journal.

[32]  Mohan Kumar,et al.  Mosaic: Processing a Trillion-Edge Graph on a Single Machine , 2017, EuroSys.

[33]  Mohammed J. Zaki,et al.  Arabesque: a system for distributed graph mining , 2015, SOSP.

[34]  Jiangchuan Liu,et al.  Statistics and Social Network of YouTube Videos , 2008, 2008 16th Interntional Workshop on Quality of Service.

[35]  Zhimin Zhang,et al.  Alleviating Irregularity in Graph Analytics Acceleration: a Hardware/Software Co-Design Approach , 2019, MICRO.

[36]  Jianzhong Li,et al.  Efficient Subgraph Matching on Billion Node Graphs , 2012, Proc. VLDB Endow..

[37]  Ramyad Hadidi,et al.  GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[38]  Weimin Zheng,et al.  Squeezing out All the Value of Loaded Data: An Out-of-core Graph Processing System with Reduced Disk I/O , 2017, USENIX Annual Technical Conference.

[39]  Hong Cheng,et al.  Subgraph Matching: on Compression and Computation , 2017, Proc. VLDB Endow..

[40]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[41]  Hai Jin,et al.  PathGraph: A Path Centric Graph Processing System , 2016, IEEE Transactions on Parallel and Distributed Systems.

[42]  Margaret Martonosi,et al.  Graphicionado: A high-performance and energy-efficient accelerator for graph analytics , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[43]  Shirish Tatikonda,et al.  From "Think Like a Vertex" to "Think Like a Graph" , 2013, Proc. VLDB Endow..

[44]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[45]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[46]  Yogesh L. Simmhan,et al.  GoFFish: A Sub-graph Centric Framework for Large-Scale Graph Analytics , 2013, Euro-Par.

[47]  Ozcan Ozturk,et al.  Energy Efficient Architecture for Graph Analytics Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[48]  Panos Kalnis,et al.  GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph , 2014, Proc. VLDB Endow..