论文信息 - Overlapping computation and communication of three-dimensional FDTD on a GPU cluster

Overlapping computation and communication of three-dimensional FDTD on a GPU cluster

Abstract Large-scale electromagnetic field simulations using the FDTD (finite-difference time-domain) method require the use of GPU (graphics processing unit) clusters. However, the communication overhead caused by slow interconnections becomes a major performance bottleneck. In this paper, as a way to remove the bottleneck, we propose the ‘kernel-split method’ and the ‘host-buffer method’ which overlap computation and communication for the FDTD simulation on the GPU cluster. The host-buffer method in particular enables overlapping without any modifications to the update-kernels that are already in use. We also present theoretical formulas to predict the overlap threshold and the total throughput for each method. By using our overlap methods with 6 GPU nodes, we demonstrate that the total performance of 3D FDTD reaches 92% of a six-fold increase, which is the upper limit that would be reached if there were no communication overhead.

Ki-Hwan Kim | Q.-Han Park | Q. Park | Ki-Hwan Kim

[1] Jan K. Sykulski,et al. Editor of International Journal of Numerical Modelling: Electronic Networks, Devices and Fields. , 1996 .

[2] M.M. Okoniewski,et al. Acceleration of finite-difference time-domain (FDTD) using graphics processor units (GPU) , 2004, 2004 IEEE MTT-S International Microwave Symposium Digest (IEEE Cat. No.04CH37535).

[3] Allen Taflove,et al. Computational Electrodynamics the Finite-Difference Time-Domain Method , 1995 .

[4] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[5] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[6] Ahmad Afsahi,et al. 10-Gigabit iWARP Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[7] Dhabaleswar K. Panda,et al. Performance characterization of a 10-Gigabit Ethernet TOE , 2005, 13th Symposium on High Performance Interconnects (HOTI'05).

[8] P. Baccarelli. IEEE Antennas and Wireless Propagation Letters , 2018, IEEE Antennas and Wireless Propagation Letters.

[9] M.J. Inman,et al. Programming video cards for computational electromagnetics applications , 2005, IEEE Antennas and Propagation Magazine.

[10] Ki-Hwan Kim,et al. Performance analysis and optimization of three-dimensional FDTD on GPU using roofline model , 2011, Comput. Phys. Commun..

[11] John E. Stone,et al. GPU clusters for high-performance computing , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[12] M. Mrozowski,et al. How to Render FDTD Computations More Effective Using a Graphics Accelerator , 2009, IEEE Transactions on Magnetics.

[13] Parastoo Sadeghi,et al. On optimization of finite-difference time-domain (FDTD) computation on heterogeneous and GPU clusters , 2011, J. Parallel Distributed Comput..

[14] Arie E. Kaufman,et al. GPU Cluster for High Performance Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[15] Hans De Sterck,et al. Parallel hyperbolic PDE simulation on clusters: Cell versus GPU , 2010, Comput. Phys. Commun..

[16] Elsevier Sdol,et al. Journal of Parallel and Distributed Computing , 2009 .

[17] Sayantan Sur,et al. MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters , 2011, Computer Science - Research and Development.

[18] M R Zunoubi,et al. CUDA Implementation of ${\rm TE}^{z}$-FDTD Solution of Maxwell's Equations in Dispersive Media , 2010, IEEE Antennas and Wireless Propagation Letters.

[19] Dhabaleswar K. Panda,et al. Microbenchmark performance comparison of high-speed cluster interconnects , 2004, IEEE Micro.

[20] D. P. Rodohan,et al. A distributed implementation of the finite difference time-domain (FDTD) method , 1995 .

[21] S. Adams,et al. Finite Difference Time Domain (FDTD) Simulations Using Graphics Processors , 2007, 2007 DoD High Performance Computing Modernization Program Users Group Conference.

[22] Zhe Fan,et al. [IEEE ACM/IEEE SC2004 Conference - Pittsburgh, PA, USA (06-12 Nov. 2004)] Proceedings of the ACM/IEEE SC2004 Conference - GPU Cluster for High Performance Computing , 2004 .

[23] Kenneth A. Hawick,et al. Asynchronous Communication Schemes for Finite Difference Methods on Multiple GPUs , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.