Spectral Compressive Imaging Reconstruction Using Convolution and Spectral Contextual Transformer

Spectral compressive imaging (SCI) is able to encode the high-dimensional hyperspectral image to a 2D measurement, and then uses algorithms to reconstruct the spatiospectral data-cube. At present, the main bottleneck of SCI is the reconstruction algorithm, and the state-of-theart (SOTA) reconstruction methods generally face the problem of long reconstruction time and/or poor detail recovery. In this paper, we propose a novel hybrid network module, namely CSCoT (Convolution and Spectral Contextual Transformer) block, which can acquire the local perception of convolution and the global perception of transformer simultaneously, and is conducive to improving the quality of reconstruction to restore fine details. We integrate the proposed CSCoT block into deep unfolding framework based on the generalized alternating projection algorithm, and further propose the GAP-CSCoT network. Finally, we apply the GAP-CSCoT algorithm to SCI reconstruction. Through the experiments of extensive synthetic and real data, our proposed model achieves higher reconstruction quality (>2dB in PSNR on simulated benchmark datasets) and shorter running time than existing SOTA algorithms by a large margin. The code and models will be released to the public.

[1]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[2]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Mário A. T. Figueiredo,et al.  Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.

[8]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Guangming Shi,et al.  Deep Gaussian Scale Mixture Prior for Spectral Compressive Imaging , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Nenghai Yu,et al.  CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  A. Ozcan,et al.  On the use of deep learning for computational imaging , 2019, Optica.

[12]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[13]  Xin Yuan,et al.  Compressive Hyperspectral Imaging With Side Information , 2015, IEEE Journal of Selected Topics in Signal Processing.

[14]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[15]  W. Zuo,et al.  Deep Learning on Image Denoising: An overview , 2019, Neural Networks.

[16]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[17]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[18]  Zongben Xu,et al.  ADMM-CSNet: A Deep Learning Approach for Image Compressive Sensing , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Giljoo Nam,et al.  High-quality hyperspectral reconstruction using a spectral prior , 2017, ACM Trans. Graph..

[20]  Xin Yuan,et al.  A New Backbone for Hyperspectral Image Reconstruction , 2021, ArXiv.

[21]  Bin Li,et al.  Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.

[22]  Xin Yuan,et al.  Deep learning for video compressive sensing , 2020, APL Photonics.

[23]  Yaowei Wang,et al.  Conformer: Local Features Coupling Global Representations for Visual Recognition , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Shree K. Nayar,et al.  Video from a single coded exposure photograph using a learned over-complete dictionary , 2011, 2011 International Conference on Computer Vision.

[25]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[26]  Qionghai Dai,et al.  Plug-and-Play Algorithms for Large-Scale Snapshot Compressive Imaging , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Luc Van Gool,et al.  SwinIR: Image Restoration Using Swin Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[28]  Xin Yuan,et al.  BIRNAT: Bidirectional Recurrent Neural Networks with Adversarial Training for Video Snapshot Compressive Imaging , 2020, ECCV.

[29]  Bo Chen,et al.  Memory-Efficient Network for Large-scale Video Compressive Sensing , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[32]  Bo Chen,et al.  MetaSCI: Scalable and Adaptive Reconstruction for Video Compressive Sensing , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Robert Stone,et al.  CenterTrack: An IP Overlay Network for Tracking DoS Floods , 2000, USENIX Security Symposium.

[36]  Bolei Zhou,et al.  Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Shree K. Nayar,et al.  Generalized Assorted Pixel Camera: Postcapture Control of Resolution, Dynamic Range, and Spectrum , 2010, IEEE Transactions on Image Processing.

[38]  Tao Mei,et al.  FastReID: A Pytorch Toolbox for General Instance Re-identification , 2020, ArXiv.

[39]  Hui Li,et al.  Generalized Alternating Projection for Weighted-퓁2, 1 Minimization with Applications to Model-Based Compressive Sensing , 2014, SIAM J. Imaging Sci..

[40]  Jonathan Le Roux,et al.  Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures , 2014, ArXiv.

[41]  Bernard Ghanem,et al.  ISTA-Net: Interpretable Optimization-Inspired Deep Network for Image Compressive Sensing , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Aggelos K. Katsaggelos,et al.  Snapshot Compressive Imaging: Theory, Algorithms, and Applications , 2021, IEEE Signal Processing Magazine.

[43]  Xin Yuan,et al.  Generalized alternating projection based total variation minimization for compressive sensing , 2015, 2016 IEEE International Conference on Image Processing (ICIP).

[44]  M E Gehm,et al.  Single-shot compressive spectral imaging with a dual-disperser architecture. , 2007, Optics express.

[45]  Jian Sun,et al.  Deep ADMM-Net for Compressive Sensing MRI , 2016, NIPS.

[46]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Shirin Jalali,et al.  GAP-net for Snapshot Compressive Imaging , 2020, 2012.08364.

[48]  Xin Yuan,et al.  Supplementary Material for “Self-supervised Neural Networks for Spectral Snapshot Compressive Imaging” , 2021 .

[49]  Xin Yuan,et al.  End-to-End Low Cost Compressive Spectral Imaging with Spatial-Spectral Self-Attention , 2020, ECCV.

[50]  José M. Bioucas-Dias,et al.  A New TwIST: Two-Step Iterative Shrinkage/Thresholding Algorithms for Image Restoration , 2007, IEEE Transactions on Image Processing.

[51]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[52]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[53]  Da-Wen Sun,et al.  Application of Hyperspectral Imaging in Food Safety Inspection and Control: A Review , 2012, Critical reviews in food science and nutrition.

[54]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[55]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Qionghai Dai,et al.  Rank Minimization for Snapshot Compressive Imaging , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Ying Fu,et al.  Hyperspectral Image Reconstruction Using a Deep Spatial-Spectral Prior , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[59]  Shensheng Han,et al.  Deep plug-and-play priors for spectral snapshot compressive imaging , 2020, Photonics Research.

[60]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[61]  Rama Chellappa,et al.  P2C2: Programmable pixel compressive camera for high speed imaging , 2011, CVPR 2011.

[62]  Tao Mei,et al.  Contextual Transformer Networks for Visual Recognition , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Shuicheng Yan,et al.  Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet , 2021, ArXiv.

[64]  Zhenming Yu,et al.  Snapshot multispectral endomicroscopy. , 2020, Optics letters.

[65]  Ashwin A. Wagadarikar,et al.  Single disperser design for coded aperture snapshot spectral imaging. , 2008, Applied optics.

[66]  Guillermo Sapiro,et al.  Coded aperture compressive temporal imaging , 2013, Optics express.

[67]  D. Tao,et al.  A Survey on Visual Transformer , 2020, ArXiv.

[68]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[69]  Vassilis Athitsos,et al.  lambda-Net: Reconstruct Hyperspectral Images From a Snapshot Measurement , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[70]  Bolei Zhou,et al.  Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.

[71]  Guillermo Sapiro,et al.  Compressive Sensing by Learning a Gaussian Mixture Model From Measurements , 2015, IEEE Transactions on Image Processing.

[72]  J. Chanussot,et al.  Hyperspectral Remote Sensing Data Analysis and Future Challenges , 2013, IEEE Geoscience and Remote Sensing Magazine.