Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

Hyperspectral image (HSI) reconstruction aims to recover the 3D spatial-spectral signal from a 2D measurement in the coded aperture snapshot spectral imaging (CASSI) system. The HSI representations are highly similar and correlated across the spectral dimension. Modeling the inter-spectra interactions is beneficial for HSI reconstruction. However, existing CNN-based methods show limitations in capturing spectral-wise similarity and long-range dependencies. Besides, the HSI information is modulated by a coded aperture (physical mask) in CASSI. Nonetheless, current algorithms have not fully explored the guidance effect of the mask for HSI restoration. In this paper, we propose a novel framework, Mask-guided Spectral-wise Transformer (MST), for HSI reconstruction. Specifically, we present a Spectral-wise Multi-head Self-Attention (S-MSA) that treats each spectral feature as a token and calculates self-attention along the spectral dimension. In addition, we customize a Mask-guided Mechanism (MM) that directs SMSA to pay attention to spatial regions with high-fidelity spectral representations. Extensive experiments show that our MST significantly outperforms state-of-the-art (SOTA) methods on simulation and real HSI datasets while requiring dramatically cheaper computational and memory costs.

[1]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Qionghai Dai,et al.  Rank Minimization for Snapshot Compressive Imaging , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Guolan Lu,et al.  Medical hyperspectral imaging: a review , 2014, Journal of biomedical optics.

[4]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[5]  Xin Yuan,et al.  Generalized alternating projection based total variation minimization for compressive sensing , 2015, 2016 IEEE International Conference on Image Processing (ICIP).

[6]  David J. Brady,et al.  Multiframe image estimation for coded aperture snapshot spectral imagers. , 2010, Applied optics.

[7]  Ying Fu,et al.  Hyperspectral Image Reconstruction Using a Deep Spatial-Spectral Prior , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Rama Chellappa,et al.  Tracking via object reflectance using a hyperspectral video camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[9]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[10]  Min H. Kim,et al.  3D imaging spectroscopy for measuring hyperspectral patterns on solid objects , 2012, ACM Trans. Graph..

[11]  M. Borengasser,et al.  Hyperspectral Remote Sensing: Principles and Applications , 2007 .

[12]  Hua Huang,et al.  DNU: Deep Non-Local Unrolling for Computational Spectral Imaging , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Xiangtao Zheng,et al.  Hyperspectral Image Superresolution by Transfer Learning , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[14]  Aggelos K. Katsaggelos,et al.  Snapshot Compressive Imaging: Theory, Algorithms, and Applications , 2021, IEEE Signal Processing Magazine.

[15]  Jianmin Bao,et al.  Uformer: A General U-Shaped Transformer for Image Restoration , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yoichi Sato,et al.  Exploiting Spectral-Spatial Correlation for Coded Hyperspectral Image Restoration , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Lorenzo Bruzzone,et al.  Classification of hyperspectral remote sensing images with support vector machines , 2004, IEEE Transactions on Geoscience and Remote Sensing.

[18]  Shree K. Nayar,et al.  Multispectral Imaging Using Multiplexed Illumination , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[19]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[20]  Qi Tian,et al.  Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation , 2021, ECCV Workshops.

[21]  Shirin Jalali,et al.  GAP-net for Snapshot Compressive Imaging , 2020, 2012.08364.

[22]  Xin Yuan,et al.  End-to-End Low Cost Compressive Spectral Imaging with Spatial-Spectral Self-Attention , 2020, ECCV.

[23]  José M. Bioucas-Dias,et al.  A New TwIST: Two-Step Iterative Shrinkage/Thresholding Algorithms for Image Restoration , 2007, IEEE Transactions on Image Processing.

[24]  Ying Fu,et al.  Computational Hyperspectral Imaging Based on Dimension-Discriminative Low-Rank Tensor Recovery , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Xiaobai Sun,et al.  Video rate spectral imaging using a coded aperture snapshot spectral imager. , 2009, Optics express.

[26]  Zhenming Yu,et al.  Snapshot multispectral endomicroscopy. , 2020, Optics letters.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Matthijs Douze,et al.  XCiT: Cross-Covariance Image Transformers , 2021, NeurIPS.

[29]  Guangming Shi,et al.  Deep Gaussian Scale Mixture Prior for Spectral Compressive Imaging , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Xin Yuan,et al.  Compressive Hyperspectral Imaging With Side Information , 2015, IEEE Journal of Selected Topics in Signal Processing.

[31]  Kurt Keutzer,et al.  Visual Transformers: Token-based Image Representation and Processing for Computer Vision , 2020, ArXiv.

[32]  Shu-Tao Xia,et al.  TokenPose: Learning Keypoint Tokens for Human Pose Estimation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  A F Goetz,et al.  Imaging Spectrometry for Earth Remote Sensing , 1985, Science.

[34]  Hui Guo,et al.  Hyperspectral Imaging With Random Printed Mask , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Stephen Lin,et al.  A Prism-Mask System for Multispectral Video Acquisition. , 2011, IEEE transactions on pattern analysis and machine intelligence.

[36]  Xin Yuan,et al.  Supplementary Material for “Self-supervised Neural Networks for Spectral Snapshot Compressive Imaging” , 2021 .

[37]  Wankou Yang,et al.  TransPose: Keypoint Localization via Transformer , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Zhuowen Tu,et al.  Pose Recognition with Cascade Transformers , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Tao Xiang,et al.  Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Gonzalo R. Arce,et al.  Compressive Hyperspectral Imaging via Approximate Message Passing , 2015, IEEE Journal of Selected Topics in Signal Processing.

[41]  Ashish Vaswani,et al.  Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.

[42]  S. Shapshay,et al.  Detection of preinvasive cancer cells , 2000, Nature.

[43]  Stephen Lin,et al.  Computational Snapshot Multispectral Cameras: Toward dynamic capture of the spectral world , 2016, IEEE Signal Processing Magazine.

[44]  Giljoo Nam,et al.  High-quality hyperspectral reconstruction using a spectral prior , 2017, ACM Trans. Graph..

[45]  Xin Yuan,et al.  A New Backbone for Hyperspectral Image Reconstruction , 2021, ArXiv.

[46]  Bin Li,et al.  Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.

[47]  Qionghai Dai,et al.  Supplementary Document : Spatial-spectral Encoded Compressive Hyperspectral Imaging , 2014 .

[48]  Wen Gao,et al.  Pre-Trained Image Processing Transformer , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Luc Van Gool,et al.  SwinIR: Image Restoration Using Swin Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[50]  Cordelia Schmid,et al.  ViViT: A Video Vision Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[51]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[52]  Bruce J. Tromberg,et al.  Face Recognition in Hyperspectral Images , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[53]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[54]  Guangming Shi,et al.  Adaptive Nonlocal Sparse Representation for Dual-Camera Compressive Hyperspectral Imaging , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Guangming Shi,et al.  Dual-camera design for coded aperture snapshot spectral imaging. , 2015, Applied optics.

[56]  Lu Yuan,et al.  Dynamic DETR: End-to-End Object Detection with Dynamic Attention , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[57]  Ashwin A. Wagadarikar,et al.  Single disperser design for coded aperture snapshot spectral imaging. , 2008, Applied optics.

[58]  Guillermo Sapiro,et al.  Coded aperture compressive temporal imaging , 2013, Optics express.

[59]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.