Perceptual Image Quality Assessment with Transformers

In this paper, we propose an image quality transformer (IQT) that successfully applies a transformer architecture to a perceptual full-reference image quality assessment (IQA) task. Perceptual representation becomes more important in image quality assessment. In this context, we extract the perceptual feature representations from each of input images using a convolutional neural network (CNN) back-bone. The extracted feature maps are fed into the transformer encoder and decoder in order to compare a reference and distorted images. Following an approach of the transformer-based vision models [18], [55], we use extra learnable quality embedding and position embedding. The output of the transformer is passed to a prediction head in order to predict a final quality score. The experimental results show that our proposed model has an outstanding performance for the standard IQA datasets. For a large-scale IQA dataset containing output images of generative model, our model also shows the promising results. The proposed IQT was ranked first among 13 participants in the NTIRE 2021 perceptual image quality assessment challenge [23]. Our work will be an opportunity to further expand the approach for the perceptual IQA task.

[1]  Haoyu Chen,et al.  Image Quality Assessment for Perceptual Image Restoration: A New Dataset, Benchmark and Metric , 2020, ArXiv.

[2]  Radu Timofte,et al.  2018 PIRM Challenge on Perceptual Image Super-resolution , 2018, ArXiv.

[3]  Vlad Hosu,et al.  KADID-10k: A Large-scale Artificially Distorted IQA Database , 2019, 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX).

[4]  Baining Guo,et al.  Learning Texture Transformer Network for Image Super-Resolution , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Eero P. Simoncelli,et al.  Image Quality Assessment: Unifying Structure and Texture Similarity , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Jun-Hyuk Kim,et al.  Deep Learning-based Image Super-Resolution Considering Quantitative and Perceptual Quality , 2018, Neurocomputing.

[7]  Martin Reisslein,et al.  Objective Video Quality Assessment Methods: A Classification, Review, and Performance Comparison , 2011, IEEE Transactions on Broadcasting.

[8]  Zhou Wang,et al.  Applications of Objective Image Quality Assessment Methods , 2011 .

[9]  Alan C. Bovik,et al.  RRED Indices: Reduced Reference Entropic Differencing for Image Quality Assessment , 2012, IEEE Transactions on Image Processing.

[10]  Shiqi Wang,et al.  Comparison of Full-Reference Image Quality Models for Optimization of Image Processing Systems , 2021, International Journal of Computer Vision.

[11]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[12]  Junyong You,et al.  Transformer For Image Quality Assessment , 2020, 2021 IEEE International Conference on Image Processing (ICIP).

[13]  Wilson S. Geisler,et al.  Image quality assessment based on a degradation model , 2000, IEEE Trans. Image Process..

[14]  Hong Cai,et al.  PieAPP: Perceptual Image-Error Assessment Through Pairwise Preference , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Radu Timofte,et al.  NTIRE 2021 Challenge on Perceptual Image Quality Assessment , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[16]  Lei Zhang,et al.  RFSIM: A feature based image quality assessment metric using Riesz transforms , 2010, 2010 IEEE International Conference on Image Processing.

[17]  Dustin Tran,et al.  Image Transformer , 2018, ICML.

[18]  Fahad Shahbaz Khan,et al.  Transformers in Vision: A Survey , 2021, ACM Comput. Surv..

[19]  A. Yuille,et al.  Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation , 2020, ECCV.

[20]  Nal Kalchbrenner,et al.  Colorization Transformer , 2021, ICLR.

[21]  Wen Gao,et al.  Pre-Trained Image Processing Transformer , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Alan C. Bovik,et al.  A Statistical Evaluation of Recent Full Reference Image Quality Assessment Algorithms , 2006, IEEE Transactions on Image Processing.

[23]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Hongyu Li,et al.  SR-SIM: A fast and high performance IQA index based on spectral residual , 2012, 2012 19th IEEE International Conference on Image Processing.

[25]  Leida Li,et al.  Subjective and objective quality assessment for image restoration: A critical survey , 2020, Signal Process. Image Commun..

[26]  Weisi Lin,et al.  Image Quality Assessment Based on Gradient Similarity , 2012, IEEE Transactions on Image Processing.

[27]  Jong-Seok Lee,et al.  Subjective and Objective Quality Assessment of Compressed 4K UHD Videos for Immersive Experience , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[28]  Haoyu Chen,et al.  PIPAL: a Large-Scale Image Quality Assessment Dataset for Perceptual Image Restoration , 2020, ECCV.

[29]  A. Bovik,et al.  A universal image quality index , 2002, IEEE Signal Processing Letters.

[30]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[31]  Valero Laparra,et al.  Perceptual image quality assessment using a normalized Laplacian pyramid , 2016, HVEI.

[32]  Shiyu Chang,et al.  TransGAN: Two Transformers Can Make One Strong GAN , 2021, ArXiv.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[35]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[36]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[37]  Guangtao Zhai,et al.  Perceptual image quality assessment: a survey , 2020, Science China Information Sciences.

[38]  Jun-Hyuk Kim,et al.  Generative adversarial network-based image super-resolution using perceptual content losses , 2018, ECCV Workshops.

[39]  Lei Zhang,et al.  Gradient Magnitude Similarity Deviation: A Highly Efficient Perceptual Image Quality Index , 2013, IEEE Transactions on Image Processing.

[40]  Kede Ma,et al.  Perceptual Quality Assessment of Smartphone Photography , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Eric C. Larson,et al.  Most apparent distortion: full-reference image quality assessment and the role of strategy , 2010, J. Electronic Imaging.

[42]  Zhou Wang,et al.  Applications of Objective Image Quality Assessment Methods [Applications Corner] , 2011, IEEE Signal Processing Magazine.

[43]  Karel Fliegel,et al.  On the accuracy of objective image and video quality models: New methodology for performance evaluation , 2016, 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX).

[44]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[45]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[46]  Gustavo de Veciana,et al.  An information fidelity criterion for image quality assessment using natural scene statistics , 2005, IEEE Transactions on Image Processing.

[47]  Yu Qiao,et al.  ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks , 2018, ECCV Workshops.

[48]  Patrick Le Callet,et al.  Ambiguity of Objective Image Quality Metrics: A New Methodology for Performance Evaluation , 2021, Signal Process. Image Commun..

[49]  David Zhang,et al.  FSIM: A Feature Similarity Index for Image Quality Assessment , 2011, IEEE Transactions on Image Processing.

[50]  Nikolay N. Ponomarenko,et al.  Image database TID2013: Peculiarities, results and perspectives , 2015, Signal Process. Image Commun..

[51]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[52]  Alan C. Bovik,et al.  Image information and visual quality , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[53]  Alan C. Bovik,et al.  Making a “Completely Blind” Image Quality Analyzer , 2013, IEEE Signal Processing Letters.

[54]  Hongyu Li,et al.  VSI: A Visual Saliency-Induced Index for Perceptual Image Quality Assessment , 2014, IEEE Transactions on Image Processing.

[55]  Sheila S. Hemami,et al.  VSNR: A Wavelet-Based Visual Signal-to-Noise Ratio for Natural Images , 2007, IEEE Transactions on Image Processing.

[56]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[57]  Mark Chen,et al.  Generative Pretraining From Pixels , 2020, ICML.

[58]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[59]  D. Saupe,et al.  KonIQ-10k: An Ecologically Valid Database for Deep Learning of Blind Image Quality Assessment , 2019, IEEE Transactions on Image Processing.