A GPU-Accelerated SVD Algorithm, Based on QR Factorization and Givens Rotations, for DWI Denoising

In this work, we present a parallel implementation of the Singular Value Decomposition (SVD) method on Graphics Processing Units (GPUs) using CUDA programming model. Our approach is based on an iterative parallel version of the QR factorization by means Givens plane rotations using the Sameh and Kuck scheme. The parallel algorithm is driven by an outer loop executed on the CPU. Therefore, threads and blocks configuration is organized in order to use the shared memory and avoid multiple accesses to global memory. However, the main kernel provides coalesced accesses to global memory using contiguous indices. As case study, we consider the application of the SVD in the Overcomplete Local Principal Component Analysis (OLPCA) algorithm for the Diffusion Weighted Imaging (DWI) denoising process. Our results show significant improvements in terms of performances with respect to the CPU version that encourage its usability for this expensive application.

[1]  Gene H. Golub,et al.  Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.

[2]  Salvatore Cuomo,et al.  A GPU-Parallel Algorithm for ECG Signal Denoising Based on the NLM Method , 2016, 2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA).

[3]  Livia Marcellino,et al.  Deconvolution of 3D Fluorescence Microscopy Images Using Graphics Processing Units , 2011, PPAM.

[4]  Jiang Du,et al.  Noise reduction in multiple-echo data sets using singular value decomposition. , 2006, Magnetic resonance imaging.

[5]  Almerico Murli,et al.  Integration of emerging computer technologies for an efficient image sequences analysis , 2011, Integr. Comput. Aided Eng..

[6]  Livia Marcellino,et al.  A numerical algorithm for image sequence inpainting that preserves fine textures , 2011, Int. J. Comput. Math..

[7]  Salvatore Cuomo,et al.  A GPU parallel implementation of the Local Principal Component Analysis overcomplete method for DW image denoising , 2016, 2016 IEEE Symposium on Computers and Communication (ISCC).

[8]  Konstantinos Konstantinides,et al.  Noise estimation and filtering using block-based singular value decomposition , 1997, IEEE Trans. Image Process..

[9]  D. Louis Collins,et al.  Diffusion Weighted Image Denoising Using Overcomplete Local PCA , 2013, PloS one.

[10]  M. Migliore,et al.  Effects of increasing CREB‐dependent transcription on the storage and recall processes in a hippocampal CA1 microcircuit , 2014, Hippocampus.

[11]  Salvatore Cuomo,et al.  GPU Profiling of Singular Value Decomposition in OLPCA Method for Image Denoising , 2016, 3PGCIC.

[12]  David J. Kuck,et al.  On Stable Parallel Linear System Solvers , 1978, JACM.

[13]  Salvatore Cuomo,et al.  3D Data Denoising via Nonlocal Means Filter by Using Parallel GPU Strategies , 2014, Comput. Math. Methods Medicine.

[14]  Pierrick Coupé,et al.  Author manuscript, published in "Journal of Magnetic Resonance Imaging 2010;31(1):192-203" DOI: 10.1002/jmri.22003 Adaptive Non-Local Means Denoising of MR Images with Spatially Varying Noise Levels , 2010 .

[15]  Jar-Ferr Yang,et al.  Combined techniques of singular value decomposition and vector quantization for image coding , 1995, IEEE Trans. Image Process..

[16]  Salvatore Cuomo,et al.  A GPU Algorithm in a Distributed Computing System for 3D MRI Denoising , 2015, 2015 10th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC).

[17]  Thomas W. Parks,et al.  Orthogonal, exactly periodic subspace decomposition , 2003, IEEE Trans. Signal Process..

[18]  Salvatore Cuomo,et al.  Toward a Multi-level Parallel Framework on GPU Cluster with PetSC-CUDA for PDE-based Optical Flow Computation , 2015, ICCS.

[19]  H. Andrews,et al.  Singular value decompositions and digital image processing , 1976 .

[20]  Jean-Michel Morel,et al.  A Review of Image Denoising Algorithms, with a New One , 2005, Multiscale Model. Simul..

[21]  Salvatore Cuomo,et al.  3D Non-Local Means denoising via multi-GPU , 2013, 2013 Federated Conference on Computer Science and Information Systems.

[22]  Salvatore Cuomo,et al.  Parallel Tools for Simulating the Depolarization Block on a Neural Model , 2015, ICCS.

[23]  Almerico Murli,et al.  Numerical Solution of Diffusion Models in Biomedical Imaging on Multicore Processors , 2011, Int. J. Biomed. Imaging.