论文信息 - GraVF-M: Graph Processing System Generation for Multi-FPGA Platforms

GraVF-M: Graph Processing System Generation for Multi-FPGA Platforms

Due to the irregular nature of connections in most graph datasets, partitioning graph analysis algorithms across multiple computational nodes that do not share a common memory inevitably leads to large amounts of interconnect traffic. Previous research has shown that FPGAs can outcompete software-based graph processing in shared memory contexts, but it remains an open question if this advantage can be maintained in distributed systems. In this work, we present GraVF-M, a framework designed to ease the implementation of FPGA-based graph processing accelerators for multi-FPGA platforms with distributed memory. Based on a lightweight description of the algorithm kernel, the framework automatically generates optimized RTL code for the whole multi-FPGA design. We exploit an aspect of the programming model to present a familiar message-passing paradigm to the user, while under the hood implementing a more efficient architecture that can reduce the necessary inter-FPGA network traffic by a factor equal to the average degree of the input graph. A performance model based on a theoretical analysis of the factors influencing performance serves to evaluate the efficiency of our implementation. With a throughput of up to 5.8GTEPS (billions of traversed edges per second) on a 4-FPGA system, the designs generated by GraVF-M compare favorably to state-of-the-art frameworks from the literature and reach 94% of the projected performance limit of the system.

Hayden K.-H. So | Nina Engelhardt

[1] Nachiket Kapre,et al. Spatial hardware implementation for sparse graph algorithms in GraphStep , 2011, TAAS.

[2] Jing Li,et al. Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform , 2018, FPGA.

[3] Ieee Staff,et al. 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig) , 2013 .

[4] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[5] Kiyoung Choi,et al. ExtraV: Boosting Graph Processing Near Storage with a Coherent Accelerator , 2017, Proc. VLDB Endow..

[6] Jing Li,et al. Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search , 2017, FPGA.

[7] Jure Leskovec,et al. {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[8] Phillip H. Jones,et al. Accelerating all-pairs shortest path using a message-passing reconfigurable architecture , 2015, 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[9] Christos Faloutsos,et al. R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[10] Kunle Olukotun,et al. GraphOps: A Dataflow Library for Graph Analytics Acceleration , 2016, FPGA.

[11] Tianshi Chen,et al. TuNao: A High-Performance and Energy-Efficient Reconfigurable Accelerator for Graph Processing , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[12] Wenguang Chen,et al. GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning , 2015, USENIX ATC.

[13] Yu Wang,et al. A Reconfigurable Computing Approach for Efficient and Scalable Parallel Graph Exploration , 2012, 2012 IEEE 23rd International Conference on Application-Specific Systems, Architectures and Processors.

[14] Richard E. Korf,et al. Multi-Way Number Partitioning , 2009, IJCAI.

[15] Engelhardt Nina,et al. Performance-Driven System Generation for Distributed Vertex-Centric Graph Processing on Multi-FPGA Systems , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[16] Nachiket Kapre. Custom FPGA-based soft-processors for sparse graph acceleration , 2015, 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[17] Hayden Kwok-Hay So,et al. Towards Flexible Automatic Generation of Graph Processing Gateware , 2017, HEART.

[18] Magnus Jahre,et al. Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[19] Phillip H. Jones,et al. CyGraph: A Reconfigurable Architecture for Parallel Breadth-First Search , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[20] Jing Li,et al. Accelerating Graph Analytics by Co-Optimizing Storage and Access on an FPGA-HMC Platform , 2018, FPGA.

[21] Oliver Diessel,et al. International Conference on Field Programmable Technology (FTP 04) , 2004 .

[22] Derek Chiou,et al. FPGA-Accelerated Transactional Execution of Graph Workloads , 2017, FPGA.

[23] Yu Wang,et al. FPGP: Graph Processing Framework on FPGA A Case Study of Breadth-First Search , 2016, FPGA.

[24] Wayne Luk,et al. A framework for FPGA acceleration of large graph problems: Graphlet counting case study , 2011, 2011 International Conference on Field-Programmable Technology.

[25] Nachiket Kapre,et al. GraphStep: A System Architecture for Sparse-Graph Algorithms , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[26] Hayden Kwok-Hay So,et al. GraVF: A vertex-centric distributed graph processing framework on FPGAs , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[27] Yu Wang,et al. ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture , 2017, FPGA.

[28] Guoqing LEI,et al. TorusBFS : A Novel Message-passing Parallel Breadth-First Search Architecture on FPGAs , 2015 .

[29] Viktor K. Prasanna,et al. An FPGA framework for edge-centric graph processing , 2018, CF.

[30] James C. Hoe,et al. GraphGen: An FPGA Framework for Vertex-Centric Graph Computation , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.

[31] Viktor K. Prasanna,et al. High-Throughput and Energy-Efficient Graph Processing on FPGA , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[32] Viktor K. Prasanna,et al. A message-passing multi-softcore architecture on FPGA for Breadth-first Search , 2010, 2010 International Conference on Field-Programmable Technology.

[33] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..