MUSIQ: Multi-scale Image Quality Transformer

Image quality assessment (IQA) is an important research topic for understanding and improving visual experience. The current state-of-the-art IQA methods are based on convolutional neural networks (CNNs). The performance of CNN-based models is often compromised by the fixed shape constraint in batch training. To accommodate this, the input images are usually resized and cropped to a fixed shape, causing image quality degradation. To address this, we design a multi-scale image quality Transformer (MUSIQ) to process native resolution images with varying sizes and aspect ratios. With a multi-scale image representation, our proposed method can capture image quality at different granularities. Furthermore, a novel hash-based 2D spatial embedding and a scale embedding is proposed to support the positional embedding in the multi-scale representation. Experimental results verify that our method can achieve state-of-the-art performance on multiple large scale IQA datasets such as PaQ-2-PiQ [41], SPAQ [11], and KonIQ-10k [16]. 1

[1]  Junyong You,et al.  Transformer For Image Quality Assessment , 2020, 2021 IEEE International Conference on Image Processing (ICIP).

[2]  Wen Gao,et al.  Pre-Trained Image Processing Transformer , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[4]  Mark Chen,et al.  Generative Pretraining From Pixels , 2020, ICML.

[5]  Yu Zhu,et al.  Blindly Assess Image Quality in the Wild Guided by a Self-Adaptive Hyper Network , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Kede Ma,et al.  Perceptual Quality Assessment of Smartphone Photography , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[8]  Guangming Shi,et al.  MetaIQA: Deep Meta-Learning for No-Reference Image Quality Assessment , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jianping Fan,et al.  Adaptive Fractional Dilated Convolution Network for Image Aesthetics Assessment , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Praful Gupta,et al.  From Patches to Pictures (PaQ-2-PiQ): Mapping the Perceptual Space of Picture Quality , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Dietmar Saupe,et al.  KonIQ-10k: An Ecologically Valid Database for Deep Learning of Blind Image Quality Assessment , 2019, IEEE Transactions on Image Processing.

[12]  Quoc V. Le,et al.  Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[14]  Zhou Wang,et al.  Blind Image Quality Assessment Using a Deep Bilinear Convolutional Neural Network , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[16]  Weisi Lin,et al.  Which Has Better Visual Quality: The Clear Blue Sky or a Blurry Animal? , 2019, IEEE Transactions on Multimedia.

[17]  Quoc V. Le,et al.  Attention Augmented Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Dietmar Saupe,et al.  Effective Aesthetics Prediction With Multi-Level Spatially Pooled Features , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Feiyue Huang,et al.  Attention-based Multi-Patch Aggregation for Image Aesthetic Assessment , 2018, ACM Multimedia.

[20]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[21]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[23]  Peyman Milanfar,et al.  NIMA: Neural Image Assessment , 2017, IEEE Transactions on Image Processing.

[24]  Lei Zhang,et al.  A Probabilistic Quality Representation Approach to Deep Blind Image Quality Prediction , 2017, ArXiv.

[25]  Naila Murray,et al.  A deep architecture for unified aesthetic prediction , 2017, ArXiv.

[26]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[28]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[29]  Shuang Ma,et al.  A-Lamp: Adaptive Layout-Aware Multi-patch Deep Convolutional Neural Network for Photo Aesthetic Assessment , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Sanghoon Lee,et al.  Fully Deep Blind Image Quality Predictor , 2017, IEEE Journal of Selected Topics in Signal Processing.

[31]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Sebastian Bosse,et al.  Deep Neural Networks for No-Reference and Full-Reference Image Quality Assessment , 2016, IEEE Transactions on Image Processing.

[33]  Alan C. Bovik,et al.  Perceptual quality prediction on authentically distorted images using a bag of features approach , 2016, Journal of vision.

[34]  Yong Liu,et al.  Blind Image Quality Assessment Based on High Order Statistics Aggregation , 2016, IEEE Transactions on Image Processing.

[35]  Hailin Jin,et al.  Composition-Preserving Deep Photo Aesthetics Assessment , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Radomír Mech,et al.  Photo Aesthetics Ranking Network with Attributes and Content Adaptation , 2016, ECCV.

[37]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Lei Zhang,et al.  A Feature-Enriched Completely Blind Image Quality Evaluator , 2015, IEEE Transactions on Image Processing.

[39]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[40]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[41]  Yi Li,et al.  Convolutional Neural Networks for No-Reference Image Quality Assessment , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Lei Zhang,et al.  Learning without Human Scores for Blind Image Quality Assessment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Alan C. Bovik,et al.  Making a “Completely Blind” Image Quality Analyzer , 2013, IEEE Signal Processing Letters.

[44]  Alan C. Bovik,et al.  No-Reference Image Quality Assessment in the Spatial Domain , 2012, IEEE Transactions on Image Processing.

[45]  Naila Murray,et al.  AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  David S. Doermann,et al.  Unsupervised feature learning framework for no-reference image quality assessment , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Alan C. Bovik,et al.  Blind Image Quality Assessment: From Natural Scene Statistics to Perceptual Quality , 2011, IEEE Transactions on Image Processing.

[48]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Edward H. Adelson,et al.  PYRAMID METHODS IN IMAGE PROCESSING. , 1984 .

[50]  Lei Zhang,et al.  A Unified Probabilistic Formulation of Image Aesthetic Assessment , 2020, IEEE Transactions on Image Processing.

[51]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[52]  John Hannah,et al.  IEEE International Conference on Image Processing (ICIP) , 1997 .