Communication Optimization for Medical Image Reconstruction Algorithms

This paper presents experiences and results obtained in optimizing the parallel communication performance of a production-quality medical image reconstruction application. The fundamental communication operations in the application's principal algorithm are collective reductions. The overhead of these operations was reduced by transforming the algorithm to overlap its computation and communication. Several different approaches to communication progress were studied, both user-directed and asynchronous. Experimental results comparing the new approach to the previous implementation show overall application performance improvements of up to 8%, when run on 32 nodes.

[1]  Torsten Hoefler,et al.  Optimizing non-blocking collective operations for infiniband , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[2]  Sergei Gorlatch,et al.  Send-receive considered harmful: Myths and realities of message passing , 2004, TOPL.

[3]  Alex Rapaport,et al.  Mpi-2: extensions to the message-passing interface , 1997 .

[4]  K. Erlandsson,et al.  Fast accurate iterative reconstruction for low-statistics positron volume imaging. , 1998, Physics in medicine and biology.

[5]  D. Visvikis,et al.  Intercomparison of four reconstruction techniques for positron volume imaging with rotating planar detectors. , 1998, Physics in medicine and biology.

[6]  Keith D. Underwood,et al.  Analyzing the Impact of Overlap, Offload, and Independent Progress for Message Passing Interface Applications , 2005, Int. J. High Perform. Comput. Appl..

[7]  Torsten Hoefler,et al.  Implementation and performance analysis of non-blocking collective operations for MPI , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[8]  Andrew J Reader,et al.  Performance evaluation of the 32-module quadHIDAC small-animal PET scanner. , 2005, Journal of nuclear medicine : official publication, Society of Nuclear Medicine.

[9]  Torsten Hoefler,et al.  Accurately measuring collective operations at massive scale , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[10]  T. Kosters,et al.  Scatter Correction in PET Using the Transport Equation , 2006, 2006 IEEE Nuclear Science Symposium Conference Record.