论文信息 - Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

Hyperspectral image (HSI) reconstruction aims to recover the 3D spatial-spectral signal from a 2D measurement in the coded aperture snapshot spectral imaging (CASSI) system. The HSI representations are highly similar and correlated across the spectral dimension. Modeling the inter-spectra interactions is beneficial for HSI reconstruction. However, existing CNN-based methods show limitations in capturing spectral-wise similarity and long-range dependencies. Besides, the HSI information is modulated by a coded aperture (physical mask) in CASSI. Nonetheless, current algorithms have not fully explored the guidance effect of the mask for HSI restoration. In this paper, we propose a novel framework, Mask-guided Spectral-wise Transformer (MST), for HSI reconstruction. Specifically, we present a Spectral-wise Multi-head Self-Attention (S-MSA) that treats each spectral feature as a token and calculates self-attention along the spectral dimension. In addition, we customize a Mask-guided Mechanism (MM) that directs SMSA to pay attention to spatial regions with high-fidelity spectral representations. Extensive experiments show that our MST significantly outperforms state-of-the-art (SOTA) methods on simulation and real HSI datasets while requiring dramatically cheaper computational and memory costs.

[1] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[2] Qionghai Dai,et al. Rank Minimization for Snapshot Compressive Imaging , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Guolan Lu,et al. Medical hyperspectral imaging: a review , 2014, Journal of biomedical optics.

[4] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.

[5] Xin Yuan,et al. Generalized alternating projection based total variation minimization for compressive sensing , 2015, 2016 IEEE International Conference on Image Processing (ICIP).

[6] David J. Brady,et al. Multiframe image estimation for coded aperture snapshot spectral imagers. , 2010, Applied optics.

[7] Ying Fu,et al. Hyperspectral Image Reconstruction Using a Deep Spatial-Spectral Prior , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Rama Chellappa,et al. Tracking via object reflectance using a hyperspectral video camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[9] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[10] Min H. Kim,et al. 3D imaging spectroscopy for measuring hyperspectral patterns on solid objects , 2012, ACM Trans. Graph..

[11] M. Borengasser,et al. Hyperspectral Remote Sensing: Principles and Applications , 2007 .

[12] Hua Huang,et al. DNU: Deep Non-Local Unrolling for Computational Spectral Imaging , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Xiangtao Zheng,et al. Hyperspectral Image Superresolution by Transfer Learning , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[14] Aggelos K. Katsaggelos,et al. Snapshot Compressive Imaging: Theory, Algorithms, and Applications , 2021, IEEE Signal Processing Magazine.

[15] Jianmin Bao,et al. Uformer: A General U-Shaped Transformer for Image Restoration , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Yoichi Sato,et al. Exploiting Spectral-Spatial Correlation for Coded Hyperspectral Image Restoration , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Lorenzo Bruzzone,et al. Classification of hyperspectral remote sensing images with support vector machines , 2004, IEEE Transactions on Geoscience and Remote Sensing.

[18] Shree K. Nayar,et al. Multispectral Imaging Using Multiplexed Illumination , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[19] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[20] Qi Tian,et al. Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation , 2021, ECCV Workshops.

[21] Shirin Jalali,et al. GAP-net for Snapshot Compressive Imaging , 2020, 2012.08364.

[22] Xin Yuan,et al. End-to-End Low Cost Compressive Spectral Imaging with Spatial-Spectral Self-Attention , 2020, ECCV.

[23] José M. Bioucas-Dias,et al. A New TwIST: Two-Step Iterative Shrinkage/Thresholding Algorithms for Image Restoration , 2007, IEEE Transactions on Image Processing.

[24] Ying Fu,et al. Computational Hyperspectral Imaging Based on Dimension-Discriminative Low-Rank Tensor Recovery , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25] Xiaobai Sun,et al. Video rate spectral imaging using a coded aperture snapshot spectral imager. , 2009, Optics express.

[26] Zhenming Yu,et al. Snapshot multispectral endomicroscopy. , 2020, Optics letters.

[27] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28] Matthijs Douze,et al. XCiT: Cross-Covariance Image Transformers , 2021, NeurIPS.

[29] Guangming Shi,et al. Deep Gaussian Scale Mixture Prior for Spectral Compressive Imaging , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Xin Yuan,et al. Compressive Hyperspectral Imaging With Side Information , 2015, IEEE Journal of Selected Topics in Signal Processing.

[31] Kurt Keutzer,et al. Visual Transformers: Token-based Image Representation and Processing for Computer Vision , 2020, ArXiv.

[32] Shu-Tao Xia,et al. TokenPose: Learning Keypoint Tokens for Human Pose Estimation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[33] A F Goetz,et al. Imaging Spectrometry for Earth Remote Sensing , 1985, Science.

[34] Hui Guo,et al. Hyperspectral Imaging With Random Printed Mask , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Stephen Lin,et al. A Prism-Mask System for Multispectral Video Acquisition. , 2011, IEEE transactions on pattern analysis and machine intelligence.

[36] Xin Yuan,et al. Supplementary Material for “Self-supervised Neural Networks for Spectral Snapshot Compressive Imaging” , 2021 .

[37] Wankou Yang,et al. TransPose: Keypoint Localization via Transformer , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[38] Zhuowen Tu,et al. Pose Recognition with Cascade Transformers , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Tao Xiang,et al. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Gonzalo R. Arce,et al. Compressive Hyperspectral Imaging via Approximate Message Passing , 2015, IEEE Journal of Selected Topics in Signal Processing.

[41] Ashish Vaswani,et al. Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.

[42] S. Shapshay,et al. Detection of preinvasive cancer cells , 2000, Nature.

[43] Stephen Lin,et al. Computational Snapshot Multispectral Cameras: Toward dynamic capture of the spectral world , 2016, IEEE Signal Processing Magazine.

[44] Giljoo Nam,et al. High-quality hyperspectral reconstruction using a spectral prior , 2017, ACM Trans. Graph..

[45] Xin Yuan,et al. A New Backbone for Hyperspectral Image Reconstruction , 2021, ArXiv.

[46] Bin Li,et al. Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.

[47] Qionghai Dai,et al. Supplementary Document : Spatial-spectral Encoded Compressive Hyperspectral Imaging , 2014 .

[48] Wen Gao,et al. Pre-Trained Image Processing Transformer , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Luc Van Gool,et al. SwinIR: Image Restoration Using Swin Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[50] Cordelia Schmid,et al. ViViT: A Video Vision Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[51] Yann LeCun,et al. Learning Fast Approximations of Sparse Coding , 2010, ICML.

[52] Bruce J. Tromberg,et al. Face Recognition in Hyperspectral Images , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[53] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[54] Guangming Shi,et al. Adaptive Nonlocal Sparse Representation for Dual-Camera Compressive Hyperspectral Imaging , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55] Guangming Shi,et al. Dual-camera design for coded aperture snapshot spectral imaging. , 2015, Applied optics.

[56] Lu Yuan,et al. Dynamic DETR: End-to-End Object Detection with Dynamic Attention , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[57] Ashwin A. Wagadarikar,et al. Single disperser design for coded aperture snapshot spectral imaging. , 2008, Applied optics.

[58] Guillermo Sapiro,et al. Coded aperture compressive temporal imaging , 2013, Optics express.

[59] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.