Parallel medical image reconstruction: from graphics processing units (GPU) to Grids

We present and compare a variety of parallelization approaches for a real-world case study on modern parallel and distributed computer architectures. Our case study is a production-quality, time-intensive algorithm for medical image reconstruction used in computer tomography (PET). We parallelize this algorithm for the main kinds of contemporary parallel architectures: shared-memory multiprocessors, distributed-memory clusters, graphics processing units (GPU) using the CUDA framework, the Cell processor and, finally, how various architectures can be accessed in a distributed Grid environment. The main contribution of the paper, besides the parallelization approaches, is their systematic comparison regarding four important criteria: performance, programming comfort, accessibility, and cost-effectiveness. We report results of experiments on particular parallel machines of different architectures that confirm the findings of our systematic comparison.

[1]  Martin Burger,et al.  Bregman-EM-TV Methods with Application to Optical Nanoscopy , 2009, SSVM.

[2]  Torsten Hoefler,et al.  Communication Optimization for Medical Image Reconstruction Algorithms , 2008, PVM/MPI.

[3]  Marcelo Cintra,et al.  SuperCoP: a general, correct, and performance-efficient supervised memory system , 2012, CF '12.

[4]  S. Gorlatch,et al.  Towards a grid system for medical image reconstruction , 2007, 2007 IEEE Nuclear Science Symposium Conference Record.

[5]  Martin Burger,et al.  Parallel Medical Image Reconstruction: From Graphics Processors to Grids , 2009, PaCT.

[6]  Sergei Gorlatch,et al.  Cost-effective medical image reconstruction: from clusters to graphics processing units , 2008, CF '08.

[7]  T. Kosters,et al.  Scatter Correction in PET Using the Transport Equation , 2006, 2006 IEEE Nuclear Science Symposium Conference Record.

[8]  D. Visvikis,et al.  Intercomparison of four reconstruction techniques for positron volume imaging with rotating planar detectors. , 1998, Physics in medicine and biology.

[9]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[10]  Andrew J Reader,et al.  Performance evaluation of the 32-module quadHIDAC small-animal PET scanner. , 2005, Journal of nuclear medicine : official publication, Society of Nuclear Medicine.

[11]  Jack Dongarra,et al.  Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users' Group Meeting, Dublin, Ireland, September 7-10, 2008. Proceedings , 2008, PVM/MPI.

[12]  K. Erlandsson,et al.  Fast accurate iterative reconstruction for low-statistics positron volume imaging. , 1998, Physics in medicine and biology.

[13]  Sergei Gorlatch,et al.  Implementing a Data-Parallel Application with Low Data Locality on Multicore Processors , 2009 .

[14]  L. Shepp,et al.  Maximum Likelihood Reconstruction for Emission Tomography , 1983, IEEE Transactions on Medical Imaging.

[15]  R. Siddon Fast calculation of the exact radiological path for a three-dimensional CT array. , 1985, Medical physics.