Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

Hyperspectral image (HSI) reconstruction aims to recover the 3D spatial-spectral signal from a 2D measurement in the coded aperture snapshot spectral imaging (CASSI) system. The HSI representations are highly similar and correlated across the spectral dimension. Modeling the inter-spectra interactions is beneficial for HSI reconstruction. However, existing CNN-based methods show limitations in capturing spectral-wise similarity and long-range dependencies. Besides, the HSI information is modulated by a coded aperture (physical mask) in CASSI. Nonetheless, current algorithms have not fully explored the guidance effect of the mask for HSI restoration. In this paper, we propose a novel framework, Mask-guided Spectral-wise Transformer (MST), for HSI reconstruction. Specifically, we present a Spectral-wise Multi-head Self-Attention (S-MSA) that treats each spectral feature as a token and calculates self-attention along the spectral dimension. In addition, we customize a Mask-guided Mechanism (MM) that directs S- MSA to pay attention to spatial regions with high-fidelity spectral representations. Extensive experiments show that our MST significantly outperforms state-of-the-art (SOTA) methods on simulation and real HSI datasets while requiring dramatically cheaper computational and memory costs. https://github.com/caiyuanhao1998/MST/

[1]  H. Pfister,et al.  Learning to Generate Realistic Noisy Images via Pixel-level Noise-aware Adversarial Training , 2022, NeurIPS.

[2]  L. Gool,et al.  HDNet: High-resolution Dual-domain Learning for Spectral Compressive Imaging , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  L. Gool,et al.  Flow-Guided Sparse Transformer for Video Deblurring , 2022, ICML.

[4]  Wenming Yang,et al.  RFormer: Transformer-Based Generative Adversarial Network for Real Fundus Image Restoration on a New Clinical Benchmark , 2022, IEEE Journal of Biomedical and Health Informatics.

[5]  Lu Yuan,et al.  Dynamic DETR: End-to-End Object Detection with Dynamic Attention , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Luc Van Gool,et al.  SwinIR: Image Restoration Using Swin Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[7]  Matthijs Douze,et al.  XCiT: Cross-Covariance Image Transformers , 2021, NeurIPS.

[8]  L. Gool,et al.  Video Super-Resolution Transformer , 2021, ArXiv.

[9]  Jianmin Bao,et al.  Uformer: A General U-Shaped Transformer for Image Restoration , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Qi Tian,et al.  Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation , 2021, ECCV Workshops.

[11]  Zhuowen Tu,et al.  Pose Recognition with Cascade Transformers , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Shu-Tao Xia,et al.  TokenPose: Learning Keypoint Tokens for Human Pose Estimation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Cordelia Schmid,et al.  ViViT: A Video Vision Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Guangming Shi,et al.  Deep Gaussian Scale Mixture Prior for Spectral Compressive Imaging , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Aggelos K. Katsaggelos,et al.  Snapshot Compressive Imaging: Theory, Algorithms, and Applications , 2021, IEEE Signal Processing Magazine.

[16]  Tao Xiang,et al.  Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Wankou Yang,et al.  TransPose: Keypoint Localization via Transformer , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Shirin Jalali,et al.  GAP-net for Snapshot Compressive Imaging , 2020, 2012.08364.

[19]  Wen Gao,et al.  Pre-Trained Image Processing Transformer , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[21]  Bin Li,et al.  Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.

[22]  Xin Yuan,et al.  End-to-End Low Cost Compressive Spectral Imaging with Spatial-Spectral Self-Attention , 2020, ECCV.

[23]  Zhenming Yu,et al.  Snapshot multispectral endomicroscopy. , 2020, Optics letters.

[24]  Kurt Keutzer,et al.  Visual Transformers: Token-based Image Representation and Processing for Computer Vision , 2020, ArXiv.

[25]  Hua Huang,et al.  DNU: Deep Non-Local Unrolling for Computational Spectral Imaging , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[27]  Xiangyu Zhang,et al.  Learning Delicate Local Representations for Multi-Person Pose Estimation , 2020, ECCV.

[28]  Ying Fu,et al.  Computational Hyperspectral Imaging Based on Dimension-Discriminative Low-Rank Tensor Recovery , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  V. Athitsos,et al.  lambda-Net: Reconstruct Hyperspectral Images From a Snapshot Measurement , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Ashish Vaswani,et al.  Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.

[31]  Ying Fu,et al.  Hyperspectral Image Reconstruction Using a Deep Spatial-Spectral Prior , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Hui Guo,et al.  Hyperspectral Imaging With Random Printed Mask , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Qionghai Dai,et al.  Rank Minimization for Snapshot Compressive Imaging , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Giljoo Nam,et al.  High-quality hyperspectral reconstruction using a spectral prior , 2017, ACM Trans. Graph..

[35]  Guangming Shi,et al.  Adaptive Nonlocal Sparse Representation for Dual-Camera Compressive Hyperspectral Imaging , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[37]  Xiangtao Zheng,et al.  Hyperspectral Image Superresolution by Transfer Learning , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[38]  Stephen Lin,et al.  Computational Snapshot Multispectral Cameras: Toward dynamic capture of the spectral world , 2016, IEEE Signal Processing Magazine.

[39]  Yoichi Sato,et al.  Exploiting Spectral-Spatial Correlation for Coded Hyperspectral Image Restoration , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Xin Yuan,et al.  Generalized alternating projection based total variation minimization for compressive sensing , 2015, 2016 IEEE International Conference on Image Processing (ICIP).

[41]  Gonzalo R. Arce,et al.  Compressive Hyperspectral Imaging via Approximate Message Passing , 2015, IEEE Journal of Selected Topics in Signal Processing.

[42]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[43]  Xin Yuan,et al.  Compressive Hyperspectral Imaging With Side Information , 2015, IEEE Journal of Selected Topics in Signal Processing.

[44]  Guangming Shi,et al.  Dual-camera design for coded aperture snapshot spectral imaging. , 2015, Applied optics.

[45]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[46]  Guillermo Sapiro,et al.  Coded aperture compressive temporal imaging , 2013, Optics express.

[47]  Min H. Kim,et al.  3D imaging spectroscopy for measuring hyperspectral patterns on solid objects , 2012, ACM Trans. Graph..

[48]  David J. Brady,et al.  Multiframe image estimation for coded aperture snapshot spectral imagers. , 2010, Applied optics.

[49]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[50]  Rama Chellappa,et al.  Tracking via object reflectance using a hyperspectral video camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[51]  Stephen Lin,et al.  A prism-based system for multispectral video acquisition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[52]  Xiaobai Sun,et al.  Video rate spectral imaging using a coded aperture snapshot spectral imager. , 2009, Optics express.

[53]  Ashwin A. Wagadarikar,et al.  Single disperser design for coded aperture snapshot spectral imaging. , 2008, Applied optics.

[54]  Shree K. Nayar,et al.  Multispectral Imaging Using Multiplexed Illumination , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[55]  M. Borengasser,et al.  Hyperspectral Remote Sensing: Principles and Applications , 2007 .

[56]  José M. Bioucas-Dias,et al.  A New TwIST: Two-Step Iterative Shrinkage/Thresholding Algorithms for Image Restoration , 2007, IEEE Transactions on Image Processing.

[57]  Lorenzo Bruzzone,et al.  Classification of hyperspectral remote sensing images with support vector machines , 2004, IEEE Transactions on Geoscience and Remote Sensing.

[58]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[59]  Bruce J. Tromberg,et al.  Face recognition in hyperspectral images , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[60]  S. Shapshay,et al.  Detection of preinvasive cancer cells , 2000, Nature.

[61]  A F Goetz,et al.  Imaging Spectrometry for Earth Remote Sensing , 1985, Science.

[62]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[63]  Y. Fu,et al.  A New Backbone for Hyperspectral Image Reconstruction , 2021, ArXiv.

[64]  Xin Yuan,et al.  Supplementary Material for “Self-supervised Neural Networks for Spectral Snapshot Compressive Imaging” , 2021 .

[65]  Guolan Lu,et al.  Medical hyperspectral imaging: a review , 2014, Journal of biomedical optics.

[66]  Qionghai Dai,et al.  Supplementary Document : Spatial-spectral Encoded Compressive Hyperspectral Imaging , 2014 .