论文信息 - Performance Evaluation of Tsunami Simulation Exploiting Temporal Parallelism on FPGAs using OpenCL

Performance Evaluation of Tsunami Simulation Exploiting Temporal Parallelism on FPGAs using OpenCL

We developed and evaluated tsunami simulations on FPGAs by designing optimized OpenCL kernels that execute 2-D stencil calculation. By using Intel FPGA SDK for OpenCL, we obtained efficient FPGA designs exploiting temporal parallelism. The performance of our optimal implementation is 446 and 790 GFlops for Arria10 and Stratix10, respectively. These implementations are much faster than a design only exploiting spatial parallelism. The performance on Stratix10 is faster than our GPU implementation on Tesla V100 GPU.

Naohito Nakasato | Fumiya Kono

[1] Dong Wang,et al. PipeCNN: An OpenCL-based open-source FPGA accelerator for convolution neural networks , 2017, 2017 International Conference on Field Programmable Technology (ICFPT).

[2] Marco D. Santambrogio,et al. An FPGA-Based Acceleration Methodology and Performance Model for Iterative Stencils , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[3] Alexander Vazhenin,et al. Evaluations of OpenCL-written tsunami simulation on FPGA and comparison with GPU implementation , 2018, The Journal of Supercomputing.

[4] Hal Finkel,et al. Evaluation of MD5Hash Kernel on OpenCL FPGA Platform , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[5] Hariyama Masanori,et al. Design of an FPGA-Based FDTD Accelerator Using OpenCL , 2013 .

[6] Daisuke Matsuoka,et al. Large-scale, high-speed tsunami prediction for the Great Nankai Trough Earthquake on the K computer , 2016, Int. J. High Perform. Comput. Appl..

[7] Kentaro Sano,et al. FPGA-based tsunami simulation: Performance comparison with GPUs, and roofline model for scalability analysis , 2017, J. Parallel Distributed Comput..

[8] Masanori Hariyama,et al. OpenCL-Based FPGA-Platform for Stencil Computation and Its Optimization Methodology , 2017, IEEE Transactions on Parallel and Distributed Systems.

[9] Vijay Laxmi,et al. Parallelizing TUNAMI-N1 Using GPGPU , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.

[10] Satoshi Matsuoka,et al. High-Performance High-Order Stencil Computation on FPGAs Using OpenCL , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[11] Vasily Titov,et al. Implementation and testing of the Method of Splitting Tsunami (MOST) model , 1997 .