论文信息 - FP-AMR: A Reconfigurable Fabric Framework for Adaptive Mesh Refinement Applications

FP-AMR: A Reconfigurable Fabric Framework for Adaptive Mesh Refinement Applications

Adaptive mesh refinement (AMR) is one of the most widely used methods in High Performance Computing accounting a large fraction of all supercomputing cycles. AMR operates by dynamically and adaptively applying computational resources non-uniformly to emphasize regions of the model as a function of their complexity. Because AMR generally uses dynamic and pointer-based data structures, acceleration is challenging, especially in hardware. As far as we are aware there has been no previous work published on accelerating AMR with FPGAs. In this paper, we introduce a reconfigurable fabric framework called FP-AMR. The work is in two parts. In the first FP-AMR offloads the bulk per-timestep computations to the FPGA; analogous systems have previously done this with GPUs. In the second part we show that the rest of the CPU-based tasks–including particle mesh mapping, mesh refinement, and coarsening–can also be mapped efficiently to the FPGA. We have evaluated FP-AMR using the widely used program AMReX and found that a single FPGA outperforms a Xeon E5-2660 CPU server (8 cores) by from 21x -23x depending on problem size and data distribution.

Tianqi Wang | Tong Geng | Xi Jin | Martin Herbordt

[1] John Shalf,et al. Phase Asynchronous AMR Execution for Productive and Performant Astrophysical Flows , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[2] Martin C. Herbordt,et al. A Framework for Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters with Work and Weight Load Balancing , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[3] Brian van Straalen,et al. A survey of high level frameworks in block-structured adaptive mesh refinement packages , 2014, J. Parallel Distributed Comput..

[4] James Coole,et al. A Traversal Cache Framework for FPGA Acceleration of Pointer Data Structures: A Case Study on Barnes-Hut N-body Simulation , 2009, 2009 International Conference on Reconfigurable Computing and FPGAs.

[5] Matthew J. Turk,et al. gamer-2: a GPU-accelerated adaptive mesh refinement code – accuracy, performance, and scalability , 2017, Monthly Notices of the Royal Astronomical Society.

[6] Benjamin Humphries,et al. Design of 3D FFTs with FPGA clusters , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[7] L. H. Howell,et al. CASTRO: A NEW COMPRESSIBLE ASTROPHYSICAL SOLVER. I. HYDRODYNAMICS AND SELF-GRAVITY , 2010, 1005.0114.

[8] John Shalf,et al. BoxLib with Tiling: An AMR Software Framework , 2016, ArXiv.

[9] Chen Yang,et al. Novo-G#: Large-scale reconfigurable computing with direct and programmable interconnects , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[10] Xi Jin,et al. FPGA acceleration of TreePM N-body simulations for Modified Newton Dynamics , 2016, 2016 International Conference on Field-Programmable Technology (FPT).

[11] Chen Yang,et al. HPC on FPGA clouds: 3D FFTs and implications for molecular dynamics , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[12] Georg Stadler,et al. Towards adaptive mesh PDE simulations on petascale computers , 2008 .

[13] Xi Jin,et al. An Accelerating Solution for N-Body MOND Simulation with FPGA-SoC , 2016, Int. J. Reconfigurable Comput..

[14] Chen Yang,et al. High Performance Dynamic Communication on Reconfigurable Clusters , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).