GPU optimized computation of stencil based algorithms

The paper describes an optimized GPU based approach for stencil based algorithms. The simulations have been performed for a two dimensional steady state heat conduction problem, which has been solved through the red black point successive over relaxation method. Two kernels have been developed and their performance has been greatly improved through coalesced memory accesses and special shared memory approaches. The approach described in the paper does not only represent a step forward for the steady state heat conduction problem but also for any other algorithm which performs the numerical solution of partial differential equations or which is stencil based. The paper not only describes the various code versions but also the process which has lead to these improvements. Also the optimized GPU code version has been compared with the corresponding CPU version. The testing results show that the GPU algorithm always leads to an improvement. The value of the improvement though greatly depends on the number of grid points on which the computations are performed.

[1]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[2]  Danilo De Donno,et al.  Introduction to GPU Computing and CUDA Programming: A Case Study on FDTD [EM Programmer's Notebook] , 2010 .

[3]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[4]  Aslak Tveito,et al.  Numerical solution of partial differential equations on parallel computers , 2006 .

[5]  Krzysztof Kurowski,et al.  Problems Related to Parallelization of CFD Algorithms on GPU, Multi‐GPU and Hybrid Architectures , 2010 .

[6]  Guanghui Zhao,et al.  Numerical Parallel Processing Based on GPU with CUDA Architecture , 2009, 2009 International Conference on Wireless Networks and Information Systems.

[7]  Baifeng Wu,et al.  High Performance Computing via a GPU , 2009, 2009 First International Conference on Information Science and Engineering.

[8]  D. Birchall,et al.  Computational Fluid Dynamics , 2020, Radial Flow Turbocompressors.

[9]  Wayne Luk,et al.  Exploring reconfigurable architectures for explicit finite difference option pricing models , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[10]  David B. Davidson Introduction to GPU Computing and CUDA Programming: A Case Study on FOlD , 2010 .

[11]  N. Takada,et al.  High-speed FDTD simulation algorithm for GPU with compute unified device architecture , 2009, 2009 IEEE Antennas and Propagation Society International Symposium.

[12]  Wen-mei W. Hwu,et al.  Program optimization carving for GPU computing , 2008, J. Parallel Distributed Comput..

[13]  Clive A. J. Fletcher,et al.  Computational Fluid Dynamics: An Introduction , 1988 .