Comparing CUDA and OpenGL implementations for a Jacobi iteration

The use of the GPU as a general purpose processor is becoming more popular and there are different approaches for this kind of programming. In this paper we present a comparison between different implementations of the OpenGL and CUDA approaches for solving our test case, a weighted Jacobi iteration with a structured matrix originating from a finite element discretization of the elliptic PDE part of the cardiac bidomain equations. The CUDA approach using textures showed to be the fastest with a speedup of 31 over a CPU implementation using one core and SSE. CUDA showed to be an efficient and easy way of programming GPU for general purpose problems, though it is also easier to write inefficient codes.