论文信息 - MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction

MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction

Existing leading methods for spectral reconstruction (SR) focus on designing deeper or wider convolutional neural networks (CNNs) to learn the end-to-end mapping from the RGB image to its hyperspectral image (HSI). These CNN-based methods achieve impressive restoration performance while showing limitations in capturing the long-range dependencies and self-similarity prior. To cope with this problem, we propose a novel Transformer-based method, Multi-stage Spectral-wise Transformer (MST++), for efficient spectral reconstruction. In particular, we em-ploy Spectral-wise Multi-head Self-attention (S-MSA) that is based on the HSI spatially sparse while spectrally self-similar nature to compose the basic unit, Spectral-wise Attention Block (SAB). Then SABs build up Single-stage Spectral-wise Transformer (SST) that exploits a U-shaped structure to extract multi-resolution contextual information. Finally, our MST++, cascaded by several SSTs, progressively improves the reconstruction quality from coarse to fine. Comprehensive experiments show that our MST++ significantly outperforms other state-of-the-art methods. In the NTIRE 2022 Spectral Reconstruction Challenge, our approach won the First place. Code and pre-trained models are publicly available at https://github.com/ .

[1] Thierry Toutin,et al. Review of developments in geometric modelling for high resolution satellite pushbroom sensors , 2012 .

[2] Dalong Du,et al. Joint COCO and LVIS workshop at ECCV 2020: COCO Keypoint Challenge Track Technical Report: UDP++ , 2020 .

[3] Kurt Keutzer,et al. Visual Transformers: Token-based Image Representation and Processing for Computer Vision , 2020, ArXiv.

[4] Ying Fu,et al. Hyperspectral Image Reconstruction Using a Deep Spatial-Spectral Prior , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] José M. Bioucas-Dias,et al. A New TwIST: Two-Step Iterative Shrinkage/Thresholding Algorithms for Image Restoration , 2007, IEEE Transactions on Image Processing.

[6] A prism-based system for multispectral video acquisition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7] Quanfu Fan,et al. CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8] Rui Song,et al. Adaptive Weighted Attention Network with Camera Spectral Sensitivity Prior for Spectral Reconstruction from RGB Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9] Shuwu Zhang,et al. Approaching the Limit of Image Rescaling via Flow Guidance , 2021, ArXiv.

[10] T. Tan,et al. Learning the Degradation Distribution for Blind Image Super-Resolution , 2022, ArXiv.

[11] Dong Liu,et al. HSCNN: CNN-Based Hyperspectral Image Recovery from Spectrally Undersampled Projections , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[12] Guillermo Sapiro,et al. Coded aperture compressive temporal imaging , 2013, Optics express.

[13] Dong Liu,et al. HSCNN+: Advanced CNN-Based Hyperspectral Recovery from RGB Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14] Guangming Shi,et al. Deep Gaussian Scale Mixture Prior for Spectral Compressive Imaging , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Syed Waqas Zamir,et al. Restormer: Efficient Transformer for High-Resolution Image Restoration , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[17] Hua Huang,et al. DNU: Deep Non-Local Unrolling for Computational Spectral Imaging , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18] R. Timofte,et al. NTIRE 2018 Challenge on Spectral Reconstruction from RGB Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19] et al.,et al. NTIRE 2020 Challenge on Spectral Reconstruction from an RGB Image , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[21] Syed Waqas Zamir,et al. Learning Enriched Features for Real Image Restoration and Enhancement , 2020, ECCV.

[22] Jianmin Bao,et al. Uformer: A General U-Shaped Transformer for Image Restoration , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Shirin Jalali,et al. GAP-net for Snapshot Compressive Imaging , 2020, 2012.08364.

[24] Radu Timofte,et al. In Defense of Shallow Learned Spectral Reconstruction from RGB Images , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[25] Stephen Lin,et al. Computational Snapshot Multispectral Cameras: Toward dynamic capture of the spectral world , 2016, IEEE Signal Processing Magazine.

[26] HDNet: High-resolution Dual-domain Learning for Spectral Compressive Imaging , 2022, ArXiv.

[27] Bin Li,et al. Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.

[28] Ashwin A. Wagadarikar,et al. Single disperser design for coded aperture snapshot spectral imaging. , 2008, Applied optics.

[29] Xiangyu Zhang,et al. Learning Delicate Local Representations for Multi-Person Pose Estimation , 2020, ECCV.

[30] Guolan Lu,et al. Medical hyperspectral imaging: a review , 2014, Journal of biomedical optics.

[31] Qionghai Dai,et al. Rank Minimization for Snapshot Compressive Imaging , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Wen Gao,et al. Pre-Trained Image Processing Transformer , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Xiaobai Sun,et al. Video rate spectral imaging using a coded aperture snapshot spectral imager. , 2009, Optics express.

[34] Xin Yuan,et al. Supplementary Material for “Self-supervised Neural Networks for Spectral Snapshot Compressive Imaging” , 2021 .

[35] Antonio Robles-Kelly,et al. Single Image Spectral Reconstruction for Multimedia Applications , 2015, ACM Multimedia.

[36] Shuwu Zhang,et al. From General to Specific: Online Updating for Blind Super-Resolution , 2021, Pattern Recognit..

[37] Min H. Kim,et al. 3D imaging spectroscopy for measuring hyperspectral patterns on solid objects , 2012, ACM Trans. Graph..

[38] Yuanhao Cai,et al. RFormer: Transformer-based Generative Adversarial Network for Real Fundus Image Restoration on A New Clinical Benchmark , 2022, ArXiv.

[39] Lorenzo Bruzzone,et al. Classification of hyperspectral remote sensing images with support vector machines , 2004, IEEE Transactions on Geoscience and Remote Sensing.

[40] Cordelia Schmid,et al. Segmenter: Transformer for Semantic Segmentation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[41] Wankou Yang,et al. TransPose: Keypoint Localization via Transformer , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[42] Zhengming Ding,et al. 3D Human Pose Estimation with Spatial and Temporal Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[43] Haoqian Wang,et al. Multi-Scale Selective Feedback Network with Dual Loss for Real Image Denoising , 2021, IJCAI.

[44] S. Shapshay,et al. Detection of preinvasive cancer cells , 2000, Nature.

[45] Luc Van Gool,et al. SwinIR: Image Restoration Using Swin Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[46] Kyoung Mu Lee,et al. Enhanced Deep Residual Networks for Single Image Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[47] Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction , 2022, ArXiv.

[48] Matthijs Douze,et al. XCiT: Cross-Covariance Image Transformers , 2021, NeurIPS.

[49] Zhuowen Tu,et al. Pose Recognition with Cascade Transformers , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Haoqian Wang,et al. Pyramid Orthogonal Attention Network based on Dual Self-Similarity for Accurate Mr Image Super-Resolution , 2021, 2021 IEEE International Conference on Multimedia and Expo (ICME).

[51] Tieniu Tan,et al. Unfolding the Alternating Optimization for Blind Super Resolution , 2020, NeurIPS.

[52] Bernhard Schölkopf,et al. Learning Blind Motion Deblurring , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[53] Yoichi Sato,et al. From RGB to Spectrum for Natural Scenes via Manifold-Based Mapping , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[54] Ling Shao,et al. Multi-Stage Progressive Image Restoration , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55] Zhigang Dai,et al. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.

[57] Zhenming Yu,et al. Snapshot multispectral endomicroscopy. , 2020, Optics letters.

[58] Xiangtao Zheng,et al. Hyperspectral Image Superresolution by Transfer Learning , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[59] H. Pfister,et al. Learning to Generate Realistic Noisy Images via Pixel-level Noise-aware Adversarial Training , 2022, NeurIPS.

[60] J. Zhang,et al. HINet: Half Instance Normalization Network for Image Restoration , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[61] M. Breuer,et al. GEOMETRIC CORRECTION OF AIRBORNE WHISKBROOM SCANNER IMAGERY USING HYBRID AUXILIARY DATA , 2000 .

[62] Xin Yuan,et al. End-to-End Low Cost Compressive Spectral Imaging with Spatial-Spectral Self-Attention , 2020, ECCV.

[63] Konrad Schindler,et al. Learned Spectral Super-Resolution , 2017, ArXiv.

[64] Xin Yuan,et al. Generalized alternating projection based total variation minimization for compressive sensing , 2015, 2016 IEEE International Conference on Image Processing (ICIP).

[65] Shu-Tao Xia,et al. TokenPose: Learning Keypoint Tokens for Human Pose Estimation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[66] Dorit Merhof,et al. Reconstructing Spectral Images from RGB-Images Using a Convolutional Neural Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[67] Li Zhang,et al. Global Aggregation then Local Distribution in Fully Convolutional Networks , 2019, BMVC.

[68] L. Gool,et al. Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[69] Cordelia Schmid,et al. ViViT: A Video Vision Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[70] Xiaolong Wang,et al. Test-Time Personalization with a Transformer for Human Pose Estimation , 2021, NeurIPS.

[71] L. Gool,et al. Flow-Guided Sparse Transformer for Video Deblurring , 2022, International Conference on Machine Learning.

[72] Chunhua Shen,et al. TFPose: Direct Human Pose Estimation with Transformers , 2021, ArXiv.

[73] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[74] Brian A. Wandell,et al. Spatio-spectral reconstruction of the multispectral datacube using sparse recovery , 2008, 2008 15th IEEE International Conference on Image Processing.

[75] Tao Xiang,et al. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[76] Luc Van Gool,et al. Video Super-Resolution Transformer , 2021, ArXiv.

[77] Ashish Vaswani,et al. Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.

[78] Bruce J. Tromberg,et al. Face Recognition in Hyperspectral Images , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[79] Aggelos K. Katsaggelos,et al. Snapshot Compressive Imaging: Theory, Algorithms, and Applications , 2021, IEEE Signal Processing Magazine.

[80] Jie Liu,et al. DFAN: Dual Feature Aggregation Network for Lightweight Image Super-Resolution , 2022, Wireless Communications and Mobile Computing.

[81] Anima Anandkumar,et al. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers , 2021, NeurIPS.

[82] M. Borengasser,et al. Hyperspectral Remote Sensing: Principles and Applications , 2007 .

[83] Boaz Arad,et al. Sparse Recovery of Hyperspectral Signal from Natural RGB Images , 2016, ECCV.

[84] Qi Tian,et al. Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation , 2021, ECCV Workshops.

[85] Lu Yuan,et al. Dynamic DETR: End-to-End Object Detection with Dynamic Attention , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[86] Lei Zhang,et al. Pixel-aware Deep Function-mixture Network for Spectral Super-Resolution , 2019, AAAI.

[87] Lai-Man Po,et al. Hierarchical Regression Network for Spectral Reconstruction from RGB Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).