FP-AMG: FPGA-Based Acceleration Framework for Algebraic Multigrid Solvers

Partial Differential Equations (PDEs) are fundamental to many real-world scientific computing applications and so their optimization has undergone decades of study. Algebraic multigrid (AMG) is one of the most well-known solvers, being widely adopted in High Performance Computing (HPC) due to its good scalability. Acceleration of AMG is known to be very challenging, due to the following reasons: (1) irregular computation patterns, (2) random memory access, and (3) a large number of kernels with various computation types. To the best of our knowledge, there is no prior work on FPGA-based acceleration of AMG. To tackle these challenges, we propose an efficient FPGA-based reconfigurable framework, called FP-AMG, for high-performance AMG calculation. In order to obtain full pipeline utilization, we propose a novel and scalable architecture that can be reused for all kernels in AMG. Given that AMG is strictly memory-bound, we propose algorithmic and architectural optimizations to ensure nearly ideal use of memory bandwidth. The efficiency of FP-AMG is evaluated with six well-known benchmarks on two FPGA devices: one with and one without high bandwidth memory (HBM). The experimental results are compared with a highly optimized Intel Xeon E5-2680-V4 implementation of the state-of-the-art HYPRE library. Our experiments show that FP-AMG can achieve average speedups of $ 2.5\times$ and $ 6.6\times$, for FPGAs without and with HBM, respectively.

[1]  Xi Jin,et al.  Accelerating AP3M-Based Computational Astrophysics Simulations with Reconfigurable Clusters , 2019, 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[2]  Martin C. Herbordt,et al.  Application-Specific Memory Interleaving Enables High Performance in FPGA-based Grid Computations , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[3]  Jiayi Sheng,et al.  Fully Integrated On-FPGA Molecular Dynamics Simulations , 2019, ArXiv.

[4]  Woody Sherman,et al.  Molecular Dynamics Range-Limited Force Evaluation Optimized for FPGAs , 2019, 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[5]  Jean-Luc Gaudiot,et al.  Speculative Execution on GPU: An Exploratory Study , 2010, 2010 39th International Conference on Parallel Processing.

[6]  Pradeep Dubey,et al.  High-performance algebraic multigrid solver optimized for multi-core based distributed parallel systems , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  Jiayi Sheng,et al.  Towards Low-Latency Communication on FPGA Clusters with 3 D FFT Case Study , 2015 .

[8]  Robert Strzodka,et al.  Parallel Performance of Algebraic Multigrid Domain Decomposition (AMG-DD) , 2019, ArXiv.

[9]  D. Bartuschat Algebraic Multigrid , 2007 .

[10]  Chen Yang,et al.  FPDeep: Acceleration and Load Balancing of CNN Training on FPGA Clusters , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[11]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[12]  D FalgoutRobert An Introduction to Algebraic Multigrid , 2006 .

[13]  Martin Schulz,et al.  Modeling the performance of an algebraic multigrid cycle on HPC platforms , 2011, ICS '11.

[14]  Karin Strauss,et al.  A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.

[15]  Martin Schulz,et al.  Challenges of Scaling Algebraic Multigrid Across Modern Multicore Architectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[16]  Martin C. Herbordt,et al.  Accelerating MPI Message Matching through FPGA Offload , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[17]  K. St A review of algebraic multigrid , 2001 .

[18]  R.D. Falgout,et al.  An Introduction to Algebraic Multigrid Computing , 2006, Computing in Science & Engineering.

[19]  Martin C. Herbordt,et al.  FPGA-Based Multigrid Computation for Molecular Dynamics Simulations , 2007, 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2007).

[20]  V. E. Henson,et al.  BoomerAMG: a parallel algebraic multigrid solver and preconditioner , 2002 .

[21]  Chen Yang,et al.  HPC on FPGA clouds: 3D FFTs and implications for molecular dynamics , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[22]  Ngai Wong,et al.  Design space exploration for sparse matrix-matrix multiplication on FPGAs , 2010, 2010 International Conference on Field-Programmable Technology.

[23]  Dejan Markovic,et al.  A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs , 2014, FPGA.

[24]  Hans De Sterck,et al.  Reducing Complexity in Parallel Algebraic Multigrid Preconditioners , 2004, SIAM J. Matrix Anal. Appl..

[25]  Robert Strzodka,et al.  AmgX: A Library for GPU Accelerated Algebraic Multigrid and Preconditioned Iterative Methods , 2015, SIAM J. Sci. Comput..