True 4D Image Denoising on the GPU

The use of image denoising techniques is an important part of many medical imaging applications. One common application is to improve the image quality of low-dose (noisy) computed tomography (CT) data. While 3D image denoising previously has been applied to several volumes independently, there has not been much work done on true 4D image denoising, where the algorithm considers several volumes at the same time. The problem with 4D image denoising, compared to 2D and 3D denoising, is that the computational complexity increases exponentially. In this paper we describe a novel algorithm for true 4D image denoising, based on local adaptive filtering, and how to implement it on the graphics processing unit (GPU). The algorithm was applied to a 4D CT heart dataset of the resolution 512  × 512  × 445  × 20. The result is that the GPU can complete the denoising in about 25 minutes if spatial filtering is used and in about 8 minutes if FFT-based filtering is used. The CPU implementation requires several days of processing time for spatial filtering and about 50 minutes for FFT-based filtering. The short processing time increases the clinical value of true 4D image denoising significantly.

[1]  Jitendra Malik,et al.  Scale-Space and Edge Detection Using Anisotropic Diffusion , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Martin Rumpf,et al.  Nonlinear Diffusion in Graphics Hardware , 2001, VisSym.

[3]  Mark Howison Comparing GPU Implementations of Bilateral and Anisotropic Diffusion Filters for 3D Biomedical Datasets , 2010 .

[4]  Rodney A. Kennedy,et al.  A Survey of Medical Image Registration on Multicore and the GPU , 2010, IEEE Signal Processing Magazine.

[5]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[6]  H. Knutsson Representing Local Structure Using Tensors , 1989 .

[7]  Barbara Chapman,et al.  Using OpenMP - portable shared memory parallel programming , 2007, Scientific and engineering computation.

[8]  Ross T. Whitaker,et al.  Interactive, GPU-Based Level Sets for 3D Segmentation , 2003, MICCAI.

[9]  Barbara Chapman,et al.  Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation) , 2007 .

[10]  Carl-Fredrik Westin,et al.  Three‐dimensional adaptive filtering in magnetic resonance angiography , 2001, Journal of magnetic resonance imaging : JMRI.

[11]  Jiawen Chen,et al.  Real-time edge-aware image processing with the bilateral grid , 2007, ACM Trans. Graph..

[12]  H. Knutsson,et al.  Sequential Filter Trees for Efficient 2D 3D and 4D Orientation Estimation , 1998 .

[13]  Marc M. Van Hulle,et al.  Realtime phase-based optical flow on the GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[14]  Carl-Fredrik Westin,et al.  Representing Local Structure Using Tensors II , 2011, SCIA.

[15]  Flemming Forsberg,et al.  Comparing Image Processing Techniques for Improved 3‐Dimensional Ultrasound Imaging , 2010, Journal of ultrasound in medicine : official journal of the American Institute of Ultrasound in Medicine.

[16]  Jay B. Brockman,et al.  Performance analysis of accelerated image registration using GPGPU , 2009, GPGPU-2.

[17]  Hans Knutsson,et al.  Phase based volume registration using cuda , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Yang Su,et al.  Parallel implementation of wavelet-based image denoising on programmable PC-grade graphics hardware , 2010, Signal Process..

[19]  Hans Knutsson,et al.  fMRI analysis on the GPU - Possibilities and challenges , 2012, Comput. Methods Programs Biomed..

[20]  H. Barman,et al.  A framework for anisotropic adaptive filtering and analysis of image sequences and volumes , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Jiawen Chen,et al.  Real-time edge-aware image processing with the bilateral grid , 2007, SIGGRAPH 2007.

[22]  Jong-Sen Lee,et al.  Digital Image Enhancement and Noise Filtering by Use of Local Statistics , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[24]  R. Wilson,et al.  Anisotropic Nonstationary Image Estimation and Its Applications: Part I - Restoration of Noisy Images , 1983, IEEE Transactions on Communications.

[25]  Hans Knutsson,et al.  Filter Networks , 1999, SIP.

[26]  P. J. Narayanan,et al.  CUDA cuts: Fast graph cuts on the GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[27]  Markus Gipp,et al.  Correlation analysis on GPU systems using NVIDIA’s CUDA , 2011, Journal of Real-Time Image Processing.

[28]  H. Knutsson,et al.  Advanced Filter Design , 1999 .

[29]  Pierrick Coupé,et al.  Real time ultrasound image denoising , 2011, Journal of Real-Time Image Processing.

[30]  Hans Knutsson,et al.  Five‐dimensional MRI incorporating simultaneous resolution of cardiac and respiratory phases for volumetric imaging , 2007, Journal of magnetic resonance imaging : JMRI.

[31]  Satoshi Matsuoka,et al.  Auto-tuning 3-D FFT library for CUDA GPUs , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[32]  John D. Owens,et al.  Fast Deformable Registration on the GPU: A CUDA Implementation of Demons , 2008, 2008 International Conference on Computational Sciences and Its Applications.

[33]  Qi Zhang,et al.  GPU-BASED IMAGE MANIPULATION AND ENHANCEMENT TECHNIQUES FOR DYNAMIC VOLUMETRIC MEDICAL IMAGE VISUALIZATION , 2007, 2007 4th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[34]  Hans Knutsson,et al.  A GPU accelerated interactive interface for exploratory functional connectivity analysis of FMRI data , 2011, 2011 18th IEEE International Conference on Image Processing.

[35]  Aleksandra Pizurica,et al.  A GPU-Accelerated Real-Time NLMeans Algorithm for Denoising Color Video Sequences , 2010, ACIVS.

[36]  Björn Svensson,et al.  Filter networks for efficient estimation of local 3-D structure , 2005, IEEE International Conference on Image Processing 2005.

[37]  Babette Dellen,et al.  Real-Time Image Segmentation on a GPU , 2010, Facing the Multicore-Challenge.

[38]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[39]  Hamid Soltanian-Zadeh,et al.  4D wavelet noise suppression of MR diffusion tensor data , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[40]  Jean-Michel Morel,et al.  A non-local algorithm for image denoising , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[41]  Satoshi Matsuoka,et al.  Bandwidth intensive 3-D FFT kernel for GPUs using CUDA , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[42]  Hans Knutsson,et al.  Fast Random Permutation Tests Enable Objective Evaluation of Methods for Single-Subject fMRI Analysis , 2011, Int. J. Biomed. Imaging.

[43]  Johan Montagnat,et al.  Anisotropic filtering for model-based segmentation of 4D cylindrical echocardiographic images , 2003, Pattern Recognit. Lett..

[44]  Hans Knutsson,et al.  Signal processing for computer vision , 1994 .

[45]  C. Westin,et al.  Normalized and differential convolution , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.