ThunderGP: HLS-based Graph Processing Framework on FPGAs

FPGA has been an emerging computing infrastructure in datacenters benefiting from features of fine-grained parallelism, energy efficiency, and reconfigurability. Meanwhile, graph processing has attracted tremendous interest in data analytics, and its performance is in increasing demand with the rapid growth of data. Many works have been proposed to tackle the challenges of designing efficient FPGA-based accelerators for graph processing. However, the largely overlooked programmability still requires hardware design expertise and sizable development efforts from developers. In order to close the gap, we propose ThunderGP, an open-source HLS-based graph processing framework on FPGAs, with which developers could enjoy the performance of FPGA-accelerated graph processing by writing only a few high-level functions with no knowledge of the hardware. ThunderGP adopts the Gather-Apply-Scatter (GAS) model as the abstraction of various graph algorithms and realizes the model by a build-in highly-paralleled and memory-efficient accelerator template. With high-level functions as inputs, ThunderGP automatically explores the massive resources and memory bandwidth of multiple Super Logic Regions (SLRs) on FPGAs to generate accelerator and then deploys the accelerator and schedules tasks for the accelerator. We evaluate ThunderGP with seven common graph applications. The results show that accelerators on real hardware platforms deliver 2.9 times speedup over the state-of-the-art approach, running at 250MHz and achieving throughput up to 6,400 MTEPS (Million Traversed Edges Per Second). We also conduct a case study with ThunderGP, which delivers up to 419 times speedup over the CPU-based design and requires significantly reduced development efforts. This work is open-sourced on Github at https://github.com/Xtra-Computing/ThunderGP.

[1]  Peter X.-K. Song,et al.  A Spatiotemporal Epidemiological Prediction Model to Inform County-Level COVID-19 Risk in the United States , 2020 .

[2]  A. Kabiri,et al.  Interactive COVID-19 Mobility Impact and Social Distancing Analysis Platform , 2020, medRxiv.

[3]  J. Carcione,et al.  A Simulation of a COVID-19 Epidemic Based on a Deterministic SEIR Model , 2020, Frontiers in Public Health.

[4]  Akshitha Sriraman,et al.  Accelerometer: Understanding Acceleration Opportunities for Data Center Overheads at Hyperscale , 2020, ASPLOS.

[5]  Bingsheng He,et al.  Is FPGA Useful for Hash Joins? , 2020, CIDR.

[6]  Ying Wang,et al.  OBFS: OpenCL Based BFS Optimizations on Software Programmable FPGAs , 2019, 2019 International Conference on Field-Programmable Technology (ICFPT).

[7]  Viktor K. Prasanna,et al.  HitGraph: High-throughput Graph Processing Framework on FPGA , 2019, IEEE Transactions on Parallel and Distributed Systems.

[8]  Olivier Terzo,et al.  Heterogeneous Computing Architectures : Challenges and Vision , 2019 .

[9]  Bingsheng He,et al.  On-The-Fly Parallel Data Shuffling for Graph Processing on OpenCL-Based FPGAs , 2019, 2019 29th International Conference on Field Programmable Logic and Applications (FPL).

[10]  Yan Luo,et al.  Dr. BFS: Data Centric Breadth-First Search on FPGAs , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[11]  Wayne Luk,et al.  Memory Mapping for Multi-die FPGAs , 2019, 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[12]  Torsten Hoefler,et al.  Graph Processing on FPGAs: Taxonomy, Survey, Challenges , 2019, ArXiv.

[13]  Hai Jin,et al.  Improving Performance of Graph Processing on FPGA-DRAM Platform by Two-level Vertex Caching , 2019, FPGA.

[14]  Yao Chen,et al.  Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs , 2019, FPGA.

[15]  Shreesha Srinath,et al.  An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[16]  Pengcheng Yao,et al.  An efficient graph accelerator with parallel data conflict management , 2018, PACT.

[17]  Peng Zhang,et al.  Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[18]  Dionisios N. Pnevmatikatos,et al.  A decoupled access-execute architecture for reconfigurable accelerators , 2018, CF.

[19]  Viktor K. Prasanna,et al.  An FPGA framework for edge-centric graph processing , 2018, CF.

[20]  Jason Cong,et al.  ST-Accel: A High-Level Programming Platform for Streaming Applications on FPGA , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[21]  Viktor K. Prasanna,et al.  Accelerating Graph Analytics on CPU-FPGA Heterogeneous Platform , 2017, 2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[22]  Yao Wang,et al.  Aggressive pipelining of irregular applications on reconfigurable hardware , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[23]  Jason Cong,et al.  Bandwidth optimization through on-chip memory restructuring for HLS , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[24]  Wei Zhang,et al.  FlexCL: An analytical performance model for OpenCL workloads on flexible FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[25]  Yu Wang,et al.  ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture , 2017, FPGA.

[26]  Yu Ting Chen,et al.  A Survey and Evaluation of FPGA High-Level Synthesis Tools , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[27]  Hayden Kwok-Hay So,et al.  GraVF: A vertex-centric distributed graph processing framework on FPGAs , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[28]  Viktor K. Prasanna,et al.  High-Throughput and Energy-Efficient Graph Processing on FPGA , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[29]  Wei Zhang,et al.  A performance analysis framework for optimizing OpenCL applications on FPGAs , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[30]  Kunle Olukotun,et al.  GraphOps: A Dataflow Library for Graph Analytics Acceleration , 2016, FPGA.

[31]  Yu Wang,et al.  FPGP: Graph Processing Framework on FPGA A Case Study of Breadth-First Search , 2016, FPGA.

[32]  George A. Constantinides,et al.  A Case for Work-stealing on FPGAs with OpenCL Atomics , 2016, FPGA.

[33]  Viktor K. Prasanna,et al.  Optimizing memory performance for FPGA implementation of pagerank , 2015, 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[34]  Willy Zwaenepoel,et al.  Chaos: scale-out graph processing from secondary storage , 2015, SOSP.

[35]  Eric S. Chung,et al.  A reconfigurable fabric for accelerating large-scale datacenter services , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[36]  Ryan A. Rossi,et al.  The Network Data Repository with Interactive Graph Analytics and Visualization , 2015, AAAI.

[37]  Yong Wang,et al.  SDA: Software-defined accelerator for large-scale DNN systems , 2014, 2014 IEEE Hot Chips 26 Symposium (HCS).

[38]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[39]  Yu Zhang,et al.  Enabling FPGAs in the cloud , 2014, Conf. Computing Frontiers.

[40]  James C. Hoe,et al.  GraphGen: An FPGA Framework for Vertex-Centric Graph Computation , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.

[41]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[42]  Eva Ostertagováa Modelling using polynomial regression , 2012 .

[43]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[44]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[45]  Christos Faloutsos,et al.  Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..

[46]  John E. Beasley Multidimensional Knapsack Problems , 2009, Encyclopedia of Optimization.

[47]  Ángel Martín del Rey,et al.  Modeling epidemics using cellular automata , 2006, Applied Mathematics and Computation.

[48]  David O'Sullivan Graph-Cellular Automata: A Generalised Discrete Urban and Regional Model , 2001 .

[49]  Marcelo N. Kuperman,et al.  Cellular automata and epidemiological models with spatial dependence , 1999 .

[50]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .