Detection and GPU accelerationof 3D FDTD algorithms based on memory access patterns

A semi-automatic tool is reported that first analyzes the sequential FDTD program to obtain memory access patterns and related features, and then optimizes the FDTD program with combined use of several types of CUDA memory on both Fermi and Kepler architecture GPUs. The experiments show a 13% and 18% speedup using Fermi and Kepler GPUs respectively compared to the GPU version program without optimization. Up to 142 times speedup is achieved compared to the sequential FDTD C program at a FDTD 3D mesh size of 250* 250* 250 (15.625 million mesh cells) with 10 layers CPML boundary conditions in 4096 time steps.

[1]  Robert M. Farber,et al.  CUDA Application Design and Development , 2011 .

[2]  Paulius Micikevicius,et al.  3D finite difference computation on GPUs using CUDA , 2009, GPGPU-2.

[3]  K. Yee Numerical solution of initial boundary value problems involving maxwell's equations in isotropic media , 1966 .

[4]  M.M. Okoniewski,et al.  Acceleration of finite-difference time-domain (FDTD) using graphics processor units (GPU) , 2004, 2004 IEEE MTT-S International Microwave Symposium Digest (IEEE Cat. No.04CH37535).

[5]  Satoshi Matsuoka,et al.  Bandwidth intensive 3-D FFT kernel for GPUs using CUDA , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Dennis M. Sullivan,et al.  Electromagnetic Simulation Using the FDTD Method , 2000 .

[7]  Daniel S. Katz,et al.  FDTD analysis of electromagnetic wave radiation from systems containing horn antennas , 1991 .

[8]  Dimitri Komatitsch,et al.  Accelerating a three-dimensional finite-difference wave propagation code using GPU graphics cards , 2010 .

[9]  Stephen D. Gedney,et al.  Convolution PML (CPML): An efficient FDTD implementation of the CFS–PML for arbitrary media , 2000 .

[10]  W. Scott,et al.  Accurate computation of the radiation from simple antennas using the finite-difference time-domain method , 1989, Digest on Antennas and Propagation Society International Symposium.

[11]  F. Costen,et al.  Impact of GPU memory access patterns on FDTD , 2012, Proceedings of the 2012 IEEE International Symposium on Antennas and Propagation.

[12]  David R. Kaeli,et al.  Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures , 2011, IEEE Transactions on Parallel and Distributed Systems.

[13]  Allen Taflove,et al.  Computational Electrodynamics the Finite-Difference Time-Domain Method , 1995 .

[14]  Kang Li,et al.  Parallel 3D Finite Difference Time Domain Simulations on Graphics Processors with Cuda , 2009, 2009 International Conference on Computational Intelligence and Software Engineering.

[15]  Junfeng Zhu,et al.  GPGPU Memory Estimation and Optimization Targeting OpenCL Architecture , 2012, 2012 IEEE International Conference on Cluster Computing.

[16]  A.Z. Elsherbeni,et al.  GPU based FDTD solver with CPML boundaries , 2007, 2007 IEEE Antennas and Propagation Society International Symposium.