Fully-deformable 3D image registration in two seconds

We present a highly parallel method for accurate and efficient variational deformable 3D image registration on a consumer-grade graphics processing unit (GPU). We build on recent matrix-free variational approaches and specialize the concepts to the massively-parallel manycore architecture provided by the GPU. Compared to a parallel and optimized CPU implementation, this allows us to achieve an average speedup of 32:53 on 986 real-world CT thorax-abdomen follow-up scans. At a resolution of approximately 2563 voxels, the average runtime is 1:99 seconds for the full registration. On the publicly available DIR-lab benchmark, our method ranks third with respect to average landmark error at an average runtime of 0:32 seconds.