Spectral Compressive Imaging Reconstruction Using Convolution and Contextual Transformer

Spectral compressive imaging (SCI) is able to encode the high-dimensional hyperspectral image to a 2D measurement, and then uses algorithms to reconstruct the spatio-spectral data-cube. At present, the main bottleneck of SCI is the reconstruction algorithm, and the state-of-the-art (SOTA) reconstruction methods generally face the problem of long reconstruction time and/or poor detail recovery. In this paper, we propose a novel hybrid network module, namely CCoT (Convolution and Contextual Transformer) block, which can acquire the inductive bias ability of convolution and the powerful modeling ability of transformer simultaneously,and is conducive to improving the quality of reconstruction to restore fine details. We integrate the proposed CCoT block into deep unfolding framework based on the generalized alternating projection algorithm, and further propose the GAP-CCoT network. Through the experiments of extensive synthetic and real data, our proposed model achieves higher reconstruction quality ($>$2dB in PSNR on simulated benchmark datasets) and shorter running time than existing SOTA algorithms by a large margin. The code and models are publicly available at https://github.com/ucaswangls/GAP-CCoT.

[1]  Kaixuan Wei,et al.  Deep plug-and-play prior for hyperspectral image restoration , 2022, Neurocomputing.

[2]  Luc Van Gool,et al.  SwinIR: Image Restoration Using Swin Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[3]  Tao Mei,et al.  Contextual Transformer Networks for Visual Recognition , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Nenghai Yu,et al.  CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yaowei Wang,et al.  Conformer: Local Features Coupling Global Representations for Visual Recognition , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Guangming Shi,et al.  Deep Gaussian Scale Mixture Prior for Spectral Compressive Imaging , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Bo Chen,et al.  Memory-Efficient Network for Large-scale Video Compressive Sensing , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Bo Chen,et al.  MetaSCI: Scalable and Adaptive Reconstruction for Video Compressive Sensing , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Aggelos K. Katsaggelos,et al.  Snapshot Compressive Imaging: Theory, Algorithms, and Applications , 2021, IEEE Signal Processing Magazine.

[10]  Lizhi Wang,et al.  Coded Hyperspectral Image Reconstruction Using Deep External and Internal Learning , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Francis E. H. Tay,et al.  Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[13]  Shirin Jalali,et al.  GAP-net for Snapshot Compressive Imaging , 2020, 2012.08364.

[14]  Shensheng Han,et al.  Deep plug-and-play priors for spectral snapshot compressive imaging , 2020, Photonics Research.

[15]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[16]  Bin Li,et al.  Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.

[17]  Xin Yuan,et al.  End-to-End Low Cost Compressive Spectral Imaging with Spatial-Spectral Self-Attention , 2020, ECCV.

[18]  Zhenming Yu,et al.  Snapshot multispectral endomicroscopy. , 2020, Optics letters.

[19]  Tao Mei,et al.  FastReID: A Pytorch Toolbox for General Instance Re-identification , 2020, ArXiv.

[20]  Qionghai Dai,et al.  Plug-and-Play Algorithms for Large-Scale Snapshot Compressive Imaging , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Xin Yuan,et al.  Deep learning for video compressive sensing , 2020, APL Photonics.

[22]  Zongben Xu,et al.  ADMM-CSNet: A Deep Learning Approach for Image Compressive Sensing , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Wangmeng Zuo,et al.  Deep Learning on Image Denoising: An overview , 2019, Neural Networks.

[24]  V. Athitsos,et al.  lambda-Net: Reconstruct Hyperspectral Images From a Snapshot Measurement , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  A. Ozcan,et al.  On the use of deep learning for computational imaging , 2019, Optica.

[26]  Ying Fu,et al.  Hyperspectral Image Reconstruction Using a Deep Spatial-Spectral Prior , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Qionghai Dai,et al.  Rank Minimization for Snapshot Compressive Imaging , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Giljoo Nam,et al.  High-quality hyperspectral reconstruction using a spectral prior , 2017, ACM Trans. Graph..

[29]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Bolei Zhou,et al.  Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Bernard Ghanem,et al.  ISTA-Net: Interpretable Optimization-Inspired Deep Network for Image Compressive Sensing , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[34]  Jian Sun,et al.  Deep ADMM-Net for Compressive Sensing MRI , 2016, NIPS.

[35]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Bolei Zhou,et al.  Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.

[37]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Xin Yuan,et al.  Generalized alternating projection based total variation minimization for compressive sensing , 2015, 2016 IEEE International Conference on Image Processing (ICIP).

[40]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[42]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[43]  Xin Yuan,et al.  Compressive Hyperspectral Imaging With Side Information , 2015, IEEE Journal of Selected Topics in Signal Processing.

[44]  Guillermo Sapiro,et al.  Compressive Sensing by Learning a Gaussian Mixture Model From Measurements , 2015, IEEE Transactions on Image Processing.

[45]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[46]  Trevor Darrell,et al.  Fully convolutional networks for semantic segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Jonathan Le Roux,et al.  Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures , 2014, ArXiv.

[48]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[49]  Hui Li,et al.  Generalized Alternating Projection for Weighted-퓁2, 1 Minimization with Applications to Model-Based Compressive Sensing , 2014, SIAM J. Imaging Sci..

[50]  J. Chanussot,et al.  Hyperspectral Remote Sensing Data Analysis and Future Challenges , 2013, IEEE Geoscience and Remote Sensing Magazine.

[51]  Guillermo Sapiro,et al.  Coded aperture compressive temporal imaging , 2013, Optics express.

[52]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[53]  Da-Wen Sun,et al.  Application of Hyperspectral Imaging in Food Safety Inspection and Control: A Review , 2012, Critical reviews in food science and nutrition.

[54]  Shree K. Nayar,et al.  Video from a single coded exposure photograph using a learned over-complete dictionary , 2011, 2011 International Conference on Computer Vision.

[55]  Rama Chellappa,et al.  P2C2: Programmable pixel compressive camera for high speed imaging , 2011, CVPR 2011.

[56]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[57]  Shree K. Nayar,et al.  Generalized Assorted Pixel Camera: Postcapture Control of Resolution, Dynamic Range, and Spectrum , 2010, IEEE Transactions on Image Processing.

[58]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[59]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Ashwin A. Wagadarikar,et al.  Single disperser design for coded aperture snapshot spectral imaging. , 2008, Applied optics.

[61]  Mário A. T. Figueiredo,et al.  Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.

[62]  José M. Bioucas-Dias,et al.  A New TwIST: Two-Step Iterative Shrinkage/Thresholding Algorithms for Image Restoration , 2007, IEEE Transactions on Image Processing.

[63]  M E Gehm,et al.  Single-shot compressive spectral imaging with a dual-disperser architecture. , 2007, Optics express.

[64]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[65]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[66]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[67]  Y. Fu,et al.  A New Backbone for Hyperspectral Image Reconstruction , 2021, ArXiv.

[68]  Xin Yuan,et al.  Supplementary Material for “Self-supervised Neural Networks for Spectral Snapshot Compressive Imaging” , 2021 .

[69]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[70]  Xin Yuan,et al.  BIRNAT: Bidirectional Recurrent Neural Networks with Adversarial Training for Video Snapshot Compressive Imaging , 2020, ECCV.

[71]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[72]  Erez Zadok,et al.  USENIX Association Proceedings of the FREENIX Track : 2003 , 2002 .

[73]  H. Damasio,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[74]  A. Krizhevsky ImageNet Classification with Deep Convolutional Neural Networks , 2022 .