论文信息 - CFD code adaptation to the FPGA architecture

CFD code adaptation to the FPGA architecture

For the last years, we observe the intensive development of accelerated computing platforms. Although current trends indicate a well-established position of GPU devices in the HPC environment, FPGA (Field-Programmable Gate Array) aspires to be an alternative solution to offload the CPU computation. This paper presents a systematic adaptation of four various CFD (Computational Fluids Dynamic) kernels to the Xilinx Alveo U250 FPGA. The goal of this paper is to investigate the potential of the FPGA architecture as the future infrastructure able to provide the most complex numerical simulations in the area of fluid flow modeling. The selected kernels are customized to a real-scientific scenario, compatible with the EULAG (Eulerian/semi-Lagrangian) fluid solver. The solver is used to simulate thermo-fluid flows across a wide range of scales and is extensively used in numerical weather prediction. The proposed adaptation is focused on the analysis of the strengths and weaknesses of the FPGA accelerator, considering performance and energy efficiency. The proposed adaptation is compared with a CPU implementation that was strongly optimized to provide realistic and objective benchmarks. The performance results are compared with a set of server CPUs containing various Intel generations, including Intel SkyLake-based CPUs as Xeon Gold 6148 and Xeon Platinum 8168, as well as Intel Xeon E5-2695 CPU based on the IvyBridge architecture. Since all the kernels belong to the group of memory-bound algorithms, our main challenge is to saturate global memory bandwidth and provide data locality with the intensive BRAM (Block RAM) reusing. Our adaptation allows us to reduce the performance per watt up to 80% compared to the CPUs.

Kamil Halbiniak | Krzysztof Rojek | Lukasz Kuczynski

[1] Krzysztof Rojek,et al. Machine learning method for energy reduction by utilizing dynamic mixed precision on GPU‐based supercomputers , 2019, Concurr. Comput. Pract. Exp..

[2] Christian Kühnlein,et al. A consistent framework for discrete integrations of soundproof and compressible PDEs of atmospheric dynamics , 2014, J. Comput. Phys..

[3] Sergio Iserte,et al. An study of the effect of process malleability in the energy efficiency on GPU-based clusters , 2019, The Journal of Supercomputing.

[4] Roman Wyrzykowski,et al. Systematic adaptation of stencil‐based 3D MPDATA to GPU architectures , 2017, Concurr. Comput. Pract. Exp..

[5] Giulio Giunta,et al. Accelerating Linux and Android applications on low‐power devices through remote GPGPU offloading , 2017, Concurr. Comput. Pract. Exp..

[6] Yuichiro Shibata,et al. Power Performance Profiling of 3-D Stencil Computation on an FPGA Accelerator for Efficient Pipeline Optimization , 2016, CARN.

[7] Bogdan Rosa,et al. A Study on Parallel Performance of the EULAG F90/95 Code , 2011, PPAM.

[8] Roman Wyrzykowski,et al. Performance modeling of 3D MPDATA simulations on GPU cluster , 2016, The Journal of Supercomputing.

[9] Leszek Marcinkowski,et al. Parallel ADI Preconditioners for All-Scale Atmospheric Models , 2015, PPAM.

[10] Lukasz Szustak,et al. Adaptation of fluid model EULAG to graphics processing unit architecture , 2015, Concurr. Comput. Pract. Exp..

[11] RojekKrzysztof Andrzej,et al. Adaptation of fluid model EULAG to graphics processing unit architecture , 2015 .