Fast GPU-Based CT Reconstruction using the Common Unified Device Architecture (CUDA)

The Common Unified Device Architecture (CUDA) is a fundamentally new programming approach making use of the unified shader design of the most current Graphics Processing Units (CPUs) from NVIDIA. The programming interface allows to implement an algorithm using standard C language and a few extensions without any knowledge about graphics programming using OpenGL, DirectX, and shading languages. We apply this revolutionary new technology to the FDK method, which solves the three-dimensional reconstruction task in cone-beam CT. The computational complexity of this algorithm prohibits its use for many medical applications without hardware acceleration. Today's CPUs with their high level of parallelism are cost-efficient processors for performing the FDK reconstruction according to medical requirements. In this paper, we present an innovative implementation of the most time-consuming parts of the FDK algorithm: filtering and back-projection. We also explain the required transformations to parallelize the algorithm for the CUDA architecture. Our implementation approach further allows to do an on-the-fly- reconstruction, which means that the reconstruction is completed right after the end of data acquisition. This enables us to present the reconstructed volume to the physician in real-time, immediately after the last projection image has been acquired by the scanning device. Finally, we compare our results to our highly optimized FDK implementation on the Cell Broadband Engine Architecture (CBEA), both with respect to reconstruction speed and implementation effort.