Tetrahedral Interpolation for Deformable Image Registration on GPUs

We speed up the tetrahedral interpolation step of a deformable image registration code called MORFEUS. We implement several versions of the interpolation code on a Fermi GPU (GTX480). Despite the irregularity of the code, we obtained kernel speedups of up to 24.6x, 33.7x and 62.4x on three real-lif e benchmarks. These numbers do not include the data transfer time between the CPU and the GPU because it can be amortized over the other steps of MORFEUS code. However, we do explore the impact of different GPU data transfer techniques.