Performance analysis and optimization of three-dimensional FDTD on GPU using roofline model

Abstract The Finite-Difference Time-Domain (FDTD) method is commonly used for electromagnetic field simulations. Recently, successful hardware-accelerations using Graphics Processing Unit (GPU) have been reported for the large-scale FDTD simulations. In this paper, we present a performance analysis of the three-dimensional (3D) FDTD on GPU using the roofline model. We find that theoretical predictions on maximum performance agrees well with the experimental results. We also suggest the suitable optimization methods for the best performance of FDTD on GPU. In particular, the optimized 3D FDTD program on GPU (NVIDIA Geforce GTX 480) is shown to be 64 times faster than the naively implemented program on CPU (Intel Core i7 2600).

[1]  S. Adams,et al.  Finite Difference Time Domain (FDTD) Simulations Using Graphics Processors , 2007, 2007 DoD High Performance Computing Modernization Program Users Group Conference.

[2]  M R Zunoubi,et al.  CUDA Implementation of ${\rm TE}^{z}$-FDTD Solution of Maxwell's Equations in Dispersive Media , 2010, IEEE Antennas and Wireless Propagation Letters.

[3]  Katherine Yelick,et al.  Auto-tuning stencil codes for cache-based multicore platforms , 2009 .

[4]  Samuel Williams,et al.  Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  Scott Lathrop,et al.  Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis , 2011, International Conference on High Performance Computing.

[6]  Samuel Williams,et al.  Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  M.J. Inman,et al.  Programming video cards for computational electromagnetics applications , 2005, IEEE Antennas and Propagation Magazine.

[8]  M. Mrozowski,et al.  How to Render FDTD Computations More Effective Using a Graphics Accelerator , 2009, IEEE Transactions on Magnetics.

[9]  Allen Taflove,et al.  Computational Electrodynamics the Finite-Difference Time-Domain Method , 1995 .

[10]  M.M. Okoniewski,et al.  Acceleration of finite-difference time-domain (FDTD) using graphics processor units (GPU) , 2004, 2004 IEEE MTT-S International Microwave Symposium Digest (IEEE Cat. No.04CH37535).

[11]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.