Acceleration of a finite-difference method with general purpose GPUs - Lesson learned

Modern massively parallel graphics cards (GPGPUs) offer a promise of dramatically reducing computation times of numerically-intensive data-parallel algorithms. As cards that are easily integrated into desktop PCs, they can bring computational power previously reserved for computer clusters to the office space. High performance rates make GPGPUs a very attractive target platform for scientific simulations. In this paper we present the lessons learned during the parallelization of a finite-difference time-domain method, an inherently data-parallel algorithm frequently used for numerical computations, on the state of the art graphics hardware.