Spatial adaptive and transformer fusion network (STFNet) for low-count pet blind denoising with MRI.

PURPOSE Positron emission tomography (PET) has been widely used in various clinical applications. PET is a type of emission computed tomography and operates by positron annihilation radiation. With magnetic resonance imaging (MRI) providing anatomical information, joint PET/MRI reduces the radiation exposure risk of patients. Improved hardware and imaging algorithms have been proposed to further decrease the dose from radioactive tracers or the bed duration, but few methods focus on denoising low-count PET with MRI input. The existing methods are based on fixed conventional convolution and local attention, which do not sufficiently extract and fuse contextual and complementary information from multi-modal input. There is still much room for improvement. Therefore, we propose a novel deep learning method for low-count PET/MRI denoising called the spatial-adaptive and transformer fusion network (STFNet), which consists of a Siamese encoder with a spatial-adaptive block (SA-block) and transformer fusion encoder. METHODS Our proposed STFNet consists of a Siamese encoder with an SA-block, transformer fusion encoder and two branches of the decoder. First, in the encoder, we adapt the SA-block in the Siamese encoder. The SA-block comprises deformable convolution with fusion modulation (DCFM) and two convolutional operations, which can promote network extraction of more relative and long-range contextual features. Second, the pixel-to-pixel transformer fusion encoder (TFE) helps the network establish a local and global relationship between high-level feature maps of PET and MRI. In the decoder part, we design two branches for PET denoising and MRI translation, and predictions are obtained by trainable weighted summation. This proposed algorithm is implemented to predict synthetic standard-dose neck PET images from low-count neck PET images and MRI. Additionally, this method is compared with the existing U-Net and residual U-Net methods with and without MRI input. RESULTS To demonstrate the advantages of our method, we introduce configuration studies about TFE, ablation studies and empirical comparative studies. Quantitative analyses are based on root mean square error (RSME), peak signal-to-noise ratio (PSNR), structural similarity (SSIM) and Pearson correlation coefficient (PCC). Additionally, qualitative results show the comparisons between our proposed method and other existing methods. All experimental results and visualizations show that our method achieves state-of-the-art performance in quantification and qualification. CONCLUSIONS Based on our experiments, STFNet performs better than existing methods in measurement and visualization. However, our proposed method may still be suboptimal because we apply only the L1 loss to train our dataset, and the dataset includes corrupted PET with different low counts. In the future, we may exploit a GAN-based paradigm in our STFNet to further improve the visual quality. This article is protected by copyright. All rights reserved.