Overlapping computation and communication of three-dimensional FDTD on a GPU cluster

Abstract Large-scale electromagnetic field simulations using the FDTD (finite-difference time-domain) method require the use of GPU (graphics processing unit) clusters. However, the communication overhead caused by slow interconnections becomes a major performance bottleneck. In this paper, as a way to remove the bottleneck, we propose the ‘kernel-split method’ and the ‘host-buffer method’ which overlap computation and communication for the FDTD simulation on the GPU cluster. The host-buffer method in particular enables overlapping without any modifications to the update-kernels that are already in use. We also present theoretical formulas to predict the overlap threshold and the total throughput for each method. By using our overlap methods with 6 GPU nodes, we demonstrate that the total performance of 3D FDTD reaches 92% of a six-fold increase, which is the upper limit that would be reached if there were no communication overhead.

[1]  Jan K. Sykulski,et al.  Editor of International Journal of Numerical Modelling: Electronic Networks, Devices and Fields. , 1996 .

[2]  M.M. Okoniewski,et al.  Acceleration of finite-difference time-domain (FDTD) using graphics processor units (GPU) , 2004, 2004 IEEE MTT-S International Microwave Symposium Digest (IEEE Cat. No.04CH37535).

[3]  Allen Taflove,et al.  Computational Electrodynamics the Finite-Difference Time-Domain Method , 1995 .

[4]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[5]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[6]  Ahmad Afsahi,et al.  10-Gigabit iWARP Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[7]  Dhabaleswar K. Panda,et al.  Performance characterization of a 10-Gigabit Ethernet TOE , 2005, 13th Symposium on High Performance Interconnects (HOTI'05).

[8]  P. Baccarelli IEEE Antennas and Wireless Propagation Letters , 2018, IEEE Antennas and Wireless Propagation Letters.

[9]  M.J. Inman,et al.  Programming video cards for computational electromagnetics applications , 2005, IEEE Antennas and Propagation Magazine.

[10]  Ki-Hwan Kim,et al.  Performance analysis and optimization of three-dimensional FDTD on GPU using roofline model , 2011, Comput. Phys. Commun..

[11]  John E. Stone,et al.  GPU clusters for high-performance computing , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[12]  M. Mrozowski,et al.  How to Render FDTD Computations More Effective Using a Graphics Accelerator , 2009, IEEE Transactions on Magnetics.

[13]  Parastoo Sadeghi,et al.  On optimization of finite-difference time-domain (FDTD) computation on heterogeneous and GPU clusters , 2011, J. Parallel Distributed Comput..

[14]  Arie E. Kaufman,et al.  GPU Cluster for High Performance Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[15]  Hans De Sterck,et al.  Parallel hyperbolic PDE simulation on clusters: Cell versus GPU , 2010, Comput. Phys. Commun..

[16]  Elsevier Sdol,et al.  Journal of Parallel and Distributed Computing , 2009 .

[17]  Sayantan Sur,et al.  MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters , 2011, Computer Science - Research and Development.

[18]  M R Zunoubi,et al.  CUDA Implementation of ${\rm TE}^{z}$-FDTD Solution of Maxwell's Equations in Dispersive Media , 2010, IEEE Antennas and Wireless Propagation Letters.

[19]  Dhabaleswar K. Panda,et al.  Microbenchmark performance comparison of high-speed cluster interconnects , 2004, IEEE Micro.

[20]  D. P. Rodohan,et al.  A distributed implementation of the finite difference time-domain (FDTD) method , 1995 .

[21]  S. Adams,et al.  Finite Difference Time Domain (FDTD) Simulations Using Graphics Processors , 2007, 2007 DoD High Performance Computing Modernization Program Users Group Conference.

[22]  Zhe Fan,et al.  [IEEE ACM/IEEE SC2004 Conference - Pittsburgh, PA, USA (06-12 Nov. 2004)] Proceedings of the ACM/IEEE SC2004 Conference - GPU Cluster for High Performance Computing , 2004 .

[23]  Kenneth A. Hawick,et al.  Asynchronous Communication Schemes for Finite Difference Methods on Multiple GPUs , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.