GPU acceleration of a fully 3D Iterative Reconstruction Software for PET using CUDA

A CUDA implementation of the existing software FIRST (Fast Iterative Reconstruction Software for (PET) Tomography) is presented. This implementation uses consumer graphics processing units (GPUs) to accelerate the compute-intensive parts of the reconstruction: forward and backward projection. FIRST was originally developed in FORTRAN, and it has been migrated to C language to be used with NVIDIA C for CUDA, as well as for a straightforward implementation and performance comparison between the C versions of the code running on the CPU and on the GPU. We measured the execution time of the CUDA version compared to the fastest available CPU. The CUDA implementation includes a loop re-ordering and an optimized memory allocation, which improves even more the performance of the reconstruction on the GPUs.