Pre-Trained Image Processing Transformer

As the computing power of modern hardware is increasing strongly, pre-trained deep learning models (e.g., BERT, GPT-3) learned on large-scale datasets have shown their effectiveness over conventional methods. The big progress is mainly contributed to the representation ability of transformer and its variant architectures. In this paper, we study the low-level computer vision task (e.g., denoising, super-resolution and deraining) and develop a new pre-trained model, namely, image processing transformer (IPT). To maximally excavate the capability of transformer, we present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs. The IPT model is trained on these images with multi-heads and multi-tails. In addition, the contrastive learning is introduced for well adapting to different image processing tasks. The pre-trained model can therefore efficiently employed on desired task after fine-tuning. With only one pre-trained model, IPT outperforms the current state-of-the-art methods on various low-level benchmarks. Code is available at https://github.com/huawei-noah/Pretrained-IPT and https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/cv/IPT

[1]  Xiangchu Feng,et al.  FOCNet: A Fractional Optimal Control Network for Image Denoising , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Kurt Keutzer,et al.  Visual Transformers: Token-based Image Representation and Processing for Computer Vision , 2020, ArXiv.

[3]  S. Gelly,et al.  Big Transfer (BiT): General Visual Representation Learning , 2019, ECCV.

[4]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[5]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[6]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[7]  A. N. Rajagopalan,et al.  Region-Adaptive Dense Network for Efficient Motion Deblurring , 2019, AAAI.

[8]  Ming-Ming Cheng,et al.  EGNet: Edge Guidance Network for Salient Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  B. Stenger,et al.  Deblurring by Realistic Blurring , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Xiaochun Cao,et al.  A Comprehensive Benchmark Analysis of Single Image Deraining: Current Challenges and Future Perspectives , 2021, International Journal of Computer Vision.

[11]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[12]  Shuicheng Yan,et al.  Graph-Based Global Reasoning Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[14]  Se Young Chun,et al.  Multi-Temporal Recurrent Neural Networks For Progressive Non-Uniform Single Image Deblurring With Incremental Temporal Training , 2019, ECCV.

[15]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[16]  Delu Zeng,et al.  Removing Rain from Single Images via a Deep Detail Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yun Fu,et al.  Residual Non-local Attention Networks for Image Restoration , 2019, ICLR.

[19]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[20]  F. Cendes,et al.  Texture analysis of medical images. , 2004, Clinical radiology.

[21]  Bin Li,et al.  Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.

[22]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[23]  Wangmeng Zuo,et al.  Cross-Scale Internal Graph Neural Network for Image Super-Resolution , 2020, NeurIPS.

[24]  Wangmeng Zuo,et al.  Attention-guided CNN for image denoising , 2020, Neural Networks.

[25]  Zhihai Xu,et al.  Spatial-Adaptive Network for Single Image Denoising , 2020, ECCV.

[26]  Lei Zhang,et al.  Joint Convolutional Analysis and Synthesis Sparse Representation for Single Image Layer Separation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Yun Fu,et al.  Residual Dense Network for Image Super-Resolution , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Shiyu Chang,et al.  TransGAN: Two Transformers Can Make One Strong GAN , 2021, ArXiv.

[29]  Vladlen Koltun,et al.  Exploring Self-Attention for Image Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Liang Chen,et al.  Enhanced Sparse Model for Blind Deblurring , 2020, ECCV.

[31]  Peisong Wang,et al.  ODE-Inspired Network Design for Single Image Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Shiyu Chang,et al.  The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jian Yang,et al.  MemNet: A Persistent Memory Network for Image Restoration , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Tae Hyun Kim,et al.  Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Xiaoou Tang,et al.  Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Yang Cao,et al.  Contrast Prior and Fluid Pyramid Integration for RGBD Salient Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[38]  Qinghua Hu,et al.  Progressive Image Deraining Networks: A Better and Simpler Baseline , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Xiaochun Cao,et al.  Correction to: Single Image Super-Resolution via a Holistic Attention Network , 2020, ECCV.

[40]  J. Ponce,et al.  End-to-end Interpretable Learning of Non-blind Image Deblurring , 2020, ECCV.

[41]  Dongwon Park,et al.  Blur More To Deblur Better: Multi-Blur2Deblur For Efficient Video Deblurring , 2020, ArXiv.

[42]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Rama Chellappa,et al.  Unsupervised Domain-Specific Deblurring via Disentangled Representations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jingdong Wang,et al.  OCNet: Object Context Network for Scene Parsing , 2018, ArXiv.

[45]  Shuicheng Yan,et al.  Joint Rain Detection and Removal from a Single Image with Contextualized Deep Networks , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Mohit Bansal,et al.  LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.

[47]  BANet: Blur-aware Attention Networks for Dynamic Scene Deblurring , 2021, ArXiv.

[48]  Kyoung Mu Lee,et al.  Enhanced Deep Residual Networks for Single Image Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[49]  Michael S. Brown,et al.  Rain Streak Removal Using Layer Priors , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Zhangyang Wang,et al.  DeblurGAN-v2: Deblurring (Orders-of-Magnitude) Faster and Better , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[51]  Wangmeng Zuo,et al.  Toward Convolutional Blind Denoising of Real Photographs , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Wei Su,et al.  Efficient Dynamic Scene Deblurring Using Spatially Variant Deconvolution Network With Optical Flow Guided Training , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54]  Jiebo Luo,et al.  Towards Perceptual Image Dehazing by Physics-Based Disentanglement and Adversarial Training , 2018, AAAI.

[55]  Xinghao Ding,et al.  Clearing the Skies: A Deep Network Architecture for Single-Image Rain Removal , 2016, IEEE Transactions on Image Processing.

[56]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[57]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[58]  Stamatios Lefkimmiatis,et al.  Non-local Color Image Denoising with Convolutional Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Thomas S. Huang,et al.  Balanced Two-Stage Residual Networks for Image Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[60]  Yun Fu,et al.  Image Super-Resolution Using Very Deep Residual Channel Attention Networks , 2018, ECCV.

[61]  Yun Fu,et al.  Residual Dense Network for Image Restoration , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Nick Barnes,et al.  Densely Residual Laplacian Super-Resolution , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Dacheng Tao,et al.  DehazeNet: An End-to-End System for Single Image Haze Removal , 2016, IEEE Transactions on Image Processing.

[64]  Eirikur Agustsson,et al.  NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[65]  Ying Wu,et al.  Semi-Supervised Transfer Learning for Image Rain Removal , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Jingbo Zhu,et al.  Learning Deep Transformer Models for Machine Translation , 2019, ACL.

[67]  Xiaoyong Shen,et al.  Dynamic Scene Deblurring With Parameter Selective Sharing and Nested Skip Connections , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Hongdong Li,et al.  Deep Stacked Hierarchical Multi-Patch Network for Image Deblurring , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Yunchao Wei,et al.  Integral Object Mining via Online Attention Accumulation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[70]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[71]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  John W. Paisley,et al.  Lightweight Pyramid Networks for Image Deraining , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[74]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Mark Chen,et al.  Generative Pretraining From Pixels , 2020, ICML.

[76]  Kyung-Ah Sohn,et al.  Fast, Accurate, and, Lightweight Super-Resolution with Cascading Residual Network , 2018, ECCV.

[77]  Shuai Yang,et al.  Scale-Free Single Image Deraining Via Visibility-Enhanced Recurrent Wavelet Learning , 2019, IEEE Transactions on Image Processing.

[78]  Hongbin Zha,et al.  Recurrent Squeeze-and-Excitation Context Aggregation Net for Single Image Deraining , 2018, ECCV.

[79]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[80]  Yi Wang,et al.  Scale-Recurrent Network for Deep Image Deblurring , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[81]  Xuming He,et al.  LatentGNN: Learning Efficient Non-local Relations for Visual Recognition , 2019, ICML.

[82]  Rynson W. H. Lau,et al.  Spatial Attentive Single-Image Deraining With a High Quality Real Rain Dataset , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[83]  Wangmeng Zuo,et al.  Learning Deep CNN Denoiser Prior for Image Restoration , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[85]  Lei Zhang,et al.  FFDNet: Toward a Fast and Flexible Solution for CNN-Based Image Denoising , 2017, IEEE Transactions on Image Processing.

[86]  Jizheng Xu,et al.  An All-in-One Network for Dehazing and Beyond , 2017, ArXiv.

[87]  Qi Xie,et al.  A Model-Driven Deep Neural Network for Single Image Rain Removal , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[88]  Karen O. Egiazarian,et al.  Color Image Denoising via Sparse 3D Collaborative Filtering with Grouping Constraint in Luminance-Chrominance Space , 2007, 2007 IEEE International Conference on Image Processing.

[89]  Yunjin Chen,et al.  Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[90]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[91]  Chi-Wing Fu,et al.  Depth-Attentional Features for Single-Image Rain Removal , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[92]  A. N. Rajagopalan,et al.  Spatially-Attentive Patch-Hierarchical Network for Adaptive Motion Deblurring , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[93]  Dongqing Zou,et al.  Learning Event-Based Motion Deblurring , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[94]  Vishal M. Patel,et al.  Densely Connected Pyramid Dehazing Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[95]  Shu-Tao Xia,et al.  Second-Order Attention Network for Single Image Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[96]  Wei Huang,et al.  Fusion of satellite images in urban area: Assessing the quality of resulting images , 2010, 2010 18th International Conference on Geoinformatics.