cuART: Fine-Grained Algebraic Reconstruction Technique for Computed Tomography Images on GPUs

Algebraic reconstruction technique (ART) is an iterative algorithm for computed tomography (CT) image reconstruction. Due to the high computational cost, researchers turn to modern HPC systems with GPUs to accelerate the ART algorithm. However, the existing proposals suffer from inefficient designs of compressed data structure and computational kernel on GPUs. In this paper, we identify the computational patterns in the ART as the product of a sparse matrix (and its transpose) with multiple vectors (SpMV and SpMV_T). Because the implementations with well-tuned libraries, including cuSPARSE, BRC, and CSR5, underperform the expectations, we propose cuART, a complete compression and parallelization solution for the ART-based CT on GPUs. Based on the physical characteristics, i.e., the symmetries in the system matrix, we propose the symmetry-based CSR format (SCSR), which can further compress data storage by removing symmetric but redundant non-zero elements. Leveraging the sparsity patterns of X-ray projection, wetransform the CSR format to multiple dense sub-matrices in SCSR. We then design a transposition-free kernel to optimize the data access for both SpMV and SpMV_T. The experimental results illustrate that our mechanism can reduce memory usage significantly and make practical datasets fit into a single GPU. Our results also illustrate the superior performance of cuART compared to the existing methods on CPU and GPU.

[1]  Rui Liu,et al.  GPU-Based Acceleration for Interior Tomography , 2014, IEEE Access.

[2]  B. F. Logan,et al.  The Fourier reconstruction of a head section , 1974 .

[3]  Hong Liu,et al.  A Parallel Algorithm for Game Tree Search Using GPGPU , 2015, IEEE Transactions on Parallel and Distributed Systems.

[4]  R. Gordon,et al.  A projection access order for speedy convergence of ART (algebraic reconstruction technique): a multilevel scheme for computed tomography , 1994, Physics in medicine and biology.

[5]  Michael Kunz,et al.  An implementation of 3D Electron Tomography on FPGAs , 2012, 2012 International Conference on Reconfigurable Computing and FPGAs.

[6]  Dhabaleswar K. Panda,et al.  GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation , 2014, IEEE Transactions on Parallel and Distributed Systems.

[7]  Tao Yang,et al.  GPU based iterative cone-beam CT reconstruction using empty space skipping technique. , 2013, Journal of X-ray science and technology.

[8]  G. Herman,et al.  Algebraic reconstruction techniques (ART) for three-dimensional electron microscopy and x-ray photography. , 1970, Journal of theoretical biology.

[9]  Xiaodong Yu,et al.  GPU acceleration of regular expression matching for large datasets: exploring the implementation space , 2013, CF '13.

[10]  Françoise Peyrin,et al.  Parallel Image Reconstruction on MIMD Computers for Three-Dimensional Cone-Beam Tomography , 1998, Parallel Comput..

[11]  John R. Gilbert,et al.  Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks , 2009, SPAA '09.

[12]  P. Sadayappan,et al.  An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs , 2014, ICS '14.

[13]  Shunli Zhang,et al.  Fast and accurate computation of system matrix for area integral model-based algebraic reconstruction technique , 2014 .

[14]  Brian Vinter,et al.  CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication , 2015, ICS.