MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction

Existing leading methods for spectral reconstruction (SR) focus on designing deeper or wider convolutional neural networks (CNNs) to learn the end-to-end mapping from the RGB image to its hyperspectral image (HSI). These CNN-based methods achieve impressive restoration performance while showing limitations in capturing the long-range dependencies and self-similarity prior. To cope with this problem, we propose a novel Transformer-based method, Multi-stage Spectral-wise Transformer (MST++), for efficient spectral reconstruction. In particular, we em-ploy Spectral-wise Multi-head Self-attention (S-MSA) that is based on the HSI spatially sparse while spectrally self-similar nature to compose the basic unit, Spectral-wise Attention Block (SAB). Then SABs build up Single-stage Spectral-wise Transformer (SST) that exploits a U-shaped structure to extract multi-resolution contextual information. Finally, our MST++, cascaded by several SSTs, progressively improves the reconstruction quality from coarse to fine. Comprehensive experiments show that our MST++ significantly outperforms other state-of-the-art methods. In the NTIRE 2022 Spectral Reconstruction Challenge, our approach won the First place. Code and pre-trained models are publicly available at https://github.com/ .

[1]  Thierry Toutin,et al.  Review of developments in geometric modelling for high resolution satellite pushbroom sensors , 2012 .

[2]  Dalong Du,et al.  Joint COCO and LVIS workshop at ECCV 2020: COCO Keypoint Challenge Track Technical Report: UDP++ , 2020 .

[3]  Kurt Keutzer,et al.  Visual Transformers: Token-based Image Representation and Processing for Computer Vision , 2020, ArXiv.

[4]  Ying Fu,et al.  Hyperspectral Image Reconstruction Using a Deep Spatial-Spectral Prior , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  José M. Bioucas-Dias,et al.  A New TwIST: Two-Step Iterative Shrinkage/Thresholding Algorithms for Image Restoration , 2007, IEEE Transactions on Image Processing.

[6]  A prism-based system for multispectral video acquisition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Quanfu Fan,et al.  CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Rui Song,et al.  Adaptive Weighted Attention Network with Camera Spectral Sensitivity Prior for Spectral Reconstruction from RGB Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Shuwu Zhang,et al.  Approaching the Limit of Image Rescaling via Flow Guidance , 2021, ArXiv.

[10]  T. Tan,et al.  Learning the Degradation Distribution for Blind Image Super-Resolution , 2022, ArXiv.

[11]  Dong Liu,et al.  HSCNN: CNN-Based Hyperspectral Image Recovery from Spectrally Undersampled Projections , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[12]  Guillermo Sapiro,et al.  Coded aperture compressive temporal imaging , 2013, Optics express.

[13]  Dong Liu,et al.  HSCNN+: Advanced CNN-Based Hyperspectral Recovery from RGB Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Guangming Shi,et al.  Deep Gaussian Scale Mixture Prior for Spectral Compressive Imaging , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Syed Waqas Zamir,et al.  Restormer: Efficient Transformer for High-Resolution Image Restoration , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[17]  Hua Huang,et al.  DNU: Deep Non-Local Unrolling for Computational Spectral Imaging , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  R. Timofte,et al.  NTIRE 2018 Challenge on Spectral Reconstruction from RGB Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  et al.,et al.  NTIRE 2020 Challenge on Spectral Reconstruction from an RGB Image , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[21]  Syed Waqas Zamir,et al.  Learning Enriched Features for Real Image Restoration and Enhancement , 2020, ECCV.

[22]  Jianmin Bao,et al.  Uformer: A General U-Shaped Transformer for Image Restoration , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Shirin Jalali,et al.  GAP-net for Snapshot Compressive Imaging , 2020, 2012.08364.

[24]  Radu Timofte,et al.  In Defense of Shallow Learned Spectral Reconstruction from RGB Images , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[25]  Stephen Lin,et al.  Computational Snapshot Multispectral Cameras: Toward dynamic capture of the spectral world , 2016, IEEE Signal Processing Magazine.

[26]  HDNet: High-resolution Dual-domain Learning for Spectral Compressive Imaging , 2022, ArXiv.

[27]  Bin Li,et al.  Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.

[28]  Ashwin A. Wagadarikar,et al.  Single disperser design for coded aperture snapshot spectral imaging. , 2008, Applied optics.

[29]  Xiangyu Zhang,et al.  Learning Delicate Local Representations for Multi-Person Pose Estimation , 2020, ECCV.

[30]  Guolan Lu,et al.  Medical hyperspectral imaging: a review , 2014, Journal of biomedical optics.

[31]  Qionghai Dai,et al.  Rank Minimization for Snapshot Compressive Imaging , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Wen Gao,et al.  Pre-Trained Image Processing Transformer , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Xiaobai Sun,et al.  Video rate spectral imaging using a coded aperture snapshot spectral imager. , 2009, Optics express.

[34]  Xin Yuan,et al.  Supplementary Material for “Self-supervised Neural Networks for Spectral Snapshot Compressive Imaging” , 2021 .

[35]  Antonio Robles-Kelly,et al.  Single Image Spectral Reconstruction for Multimedia Applications , 2015, ACM Multimedia.

[36]  Shuwu Zhang,et al.  From General to Specific: Online Updating for Blind Super-Resolution , 2021, Pattern Recognit..

[37]  Min H. Kim,et al.  3D imaging spectroscopy for measuring hyperspectral patterns on solid objects , 2012, ACM Trans. Graph..

[38]  Yuanhao Cai,et al.  RFormer: Transformer-based Generative Adversarial Network for Real Fundus Image Restoration on A New Clinical Benchmark , 2022, ArXiv.

[39]  Lorenzo Bruzzone,et al.  Classification of hyperspectral remote sensing images with support vector machines , 2004, IEEE Transactions on Geoscience and Remote Sensing.

[40]  Cordelia Schmid,et al.  Segmenter: Transformer for Semantic Segmentation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Wankou Yang,et al.  TransPose: Keypoint Localization via Transformer , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Zhengming Ding,et al.  3D Human Pose Estimation with Spatial and Temporal Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Haoqian Wang,et al.  Multi-Scale Selective Feedback Network with Dual Loss for Real Image Denoising , 2021, IJCAI.

[44]  S. Shapshay,et al.  Detection of preinvasive cancer cells , 2000, Nature.

[45]  Luc Van Gool,et al.  SwinIR: Image Restoration Using Swin Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[46]  Kyoung Mu Lee,et al.  Enhanced Deep Residual Networks for Single Image Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[47]  Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction , 2022, ArXiv.

[48]  Matthijs Douze,et al.  XCiT: Cross-Covariance Image Transformers , 2021, NeurIPS.

[49]  Zhuowen Tu,et al.  Pose Recognition with Cascade Transformers , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Haoqian Wang,et al.  Pyramid Orthogonal Attention Network based on Dual Self-Similarity for Accurate Mr Image Super-Resolution , 2021, 2021 IEEE International Conference on Multimedia and Expo (ICME).

[51]  Tieniu Tan,et al.  Unfolding the Alternating Optimization for Blind Super Resolution , 2020, NeurIPS.

[52]  Bernhard Schölkopf,et al.  Learning Blind Motion Deblurring , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[53]  Yoichi Sato,et al.  From RGB to Spectrum for Natural Scenes via Manifold-Based Mapping , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[54]  Ling Shao,et al.  Multi-Stage Progressive Image Restoration , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Zhigang Dai,et al.  UP-DETR: Unsupervised Pre-training for Object Detection with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[57]  Zhenming Yu,et al.  Snapshot multispectral endomicroscopy. , 2020, Optics letters.

[58]  Xiangtao Zheng,et al.  Hyperspectral Image Superresolution by Transfer Learning , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[59]  H. Pfister,et al.  Learning to Generate Realistic Noisy Images via Pixel-level Noise-aware Adversarial Training , 2022, NeurIPS.

[60]  J. Zhang,et al.  HINet: Half Instance Normalization Network for Image Restoration , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[61]  M. Breuer,et al.  GEOMETRIC CORRECTION OF AIRBORNE WHISKBROOM SCANNER IMAGERY USING HYBRID AUXILIARY DATA , 2000 .

[62]  Xin Yuan,et al.  End-to-End Low Cost Compressive Spectral Imaging with Spatial-Spectral Self-Attention , 2020, ECCV.

[63]  Konrad Schindler,et al.  Learned Spectral Super-Resolution , 2017, ArXiv.

[64]  Xin Yuan,et al.  Generalized alternating projection based total variation minimization for compressive sensing , 2015, 2016 IEEE International Conference on Image Processing (ICIP).

[65]  Shu-Tao Xia,et al.  TokenPose: Learning Keypoint Tokens for Human Pose Estimation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[66]  Dorit Merhof,et al.  Reconstructing Spectral Images from RGB-Images Using a Convolutional Neural Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[67]  Li Zhang,et al.  Global Aggregation then Local Distribution in Fully Convolutional Networks , 2019, BMVC.

[68]  L. Gool,et al.  Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Cordelia Schmid,et al.  ViViT: A Video Vision Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[70]  Xiaolong Wang,et al.  Test-Time Personalization with a Transformer for Human Pose Estimation , 2021, NeurIPS.

[71]  L. Gool,et al.  Flow-Guided Sparse Transformer for Video Deblurring , 2022, International Conference on Machine Learning.

[72]  Chunhua Shen,et al.  TFPose: Direct Human Pose Estimation with Transformers , 2021, ArXiv.

[73]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[74]  Brian A. Wandell,et al.  Spatio-spectral reconstruction of the multispectral datacube using sparse recovery , 2008, 2008 15th IEEE International Conference on Image Processing.

[75]  Tao Xiang,et al.  Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Luc Van Gool,et al.  Video Super-Resolution Transformer , 2021, ArXiv.

[77]  Ashish Vaswani,et al.  Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.

[78]  Bruce J. Tromberg,et al.  Face Recognition in Hyperspectral Images , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[79]  Aggelos K. Katsaggelos,et al.  Snapshot Compressive Imaging: Theory, Algorithms, and Applications , 2021, IEEE Signal Processing Magazine.

[80]  Jie Liu,et al.  DFAN: Dual Feature Aggregation Network for Lightweight Image Super-Resolution , 2022, Wireless Communications and Mobile Computing.

[81]  Anima Anandkumar,et al.  SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers , 2021, NeurIPS.

[82]  M. Borengasser,et al.  Hyperspectral Remote Sensing: Principles and Applications , 2007 .

[83]  Boaz Arad,et al.  Sparse Recovery of Hyperspectral Signal from Natural RGB Images , 2016, ECCV.

[84]  Qi Tian,et al.  Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation , 2021, ECCV Workshops.

[85]  Lu Yuan,et al.  Dynamic DETR: End-to-End Object Detection with Dynamic Attention , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[86]  Lei Zhang,et al.  Pixel-aware Deep Function-mixture Network for Spectral Super-Resolution , 2019, AAAI.

[87]  Lai-Man Po,et al.  Hierarchical Regression Network for Spectral Reconstruction from RGB Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).