High performance computing and simulations on the GPU using CUDA

The computational power and memory bandwidth of graphics processing units (GPUs) have turned them into attractive platforms for general-purpose applications at significant speed gains versus their CPU counterparts [1]. In addition, an increasing number of today's state-of-the-art supercomputers include commodity GPUs to bring us unprecedented levels of performance in terms of raw GFLOPS and GFLOPS/cost. In this paper, we provide an introduction to CUDA programming paradigm with an emphasis on simulations which can exploit SIMD parallelism and high memory bandwidth on GPUs. OpenCL is also briefly described as a recent standardization effort to set up an open standard API for general-purpose manycore architectures.