Spatiotemporal Feature Hierarchy-Based Blind Prediction of Natural Video Quality via Transfer Learning

In this paper, we propose a pyramidal spatiotemporal feature hierarchy (PSFH)-based no-reference (NR) video quality assessment (VQA) method using transfer learning. First, we generate simulated videos by a generative adversarial network (GAN)-based image restoration model. The residual maps between the distorted frames and simulated frames, which can capture rich information, are utilized as one input of the quality regression network. Second, we use 3D convolution operations to construct a PSFH network with five stages. The spatiotemporal features incorporating the shared features transferred from the pretrained image restoration model are fused stage by stage. Third, with the guidance of the transferred knowledge, each stage generates multiple feature mapping layers that encode different semantics and degradation information using 3D convolution layers and gated recurrent units (GRUs). Finally, five approximate perceptual quality scores and a precise prediction score are obtained by fully connected (FC) networks. The whole model is trained under a finely designed loss function that combines pseudo-Huber loss and Pearson linear correlation coefficient (PLCC) loss to improve the robustness and prediction accuracy. According to the extensive experiments, outstanding results can be obtained compared with other state-of-the-art methods. Both the source code and models are available online.1

[1]  Zhaowei Shang,et al.  An End-to-End No-Reference Video Quality Assessment Method With Hierarchical Spatiotemporal Feature Representation , 2022, IEEE Transactions on Broadcasting.

[2]  Wassim Hamidouche,et al.  Perceptual Quality Assessment of HEVC and VVC Standards for 8K Video , 2021, IEEE Transactions on Broadcasting.

[3]  Yu Wang,et al.  PQA-Net: Deep No Reference Point Cloud Quality Assessment via Multi-View Projection , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Guangtao Zhai,et al.  Subjective and Objective Quality Assessment of Compressed Screen Content Videos , 2021, IEEE Transactions on Broadcasting.

[5]  Jiefeng Guo,et al.  No-reference omnidirectional video quality assessment based on generative adversarial networks , 2021, Multimedia Tools and Applications.

[6]  Shiqi Wang,et al.  Learning Generalized Spatial-Temporal Deep Feature Representation for No-Reference Video Quality Assessment , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Huan Yang,et al.  Reduced Reference Perceptual Quality Model With Application to Rate Control for Video-Based Point Cloud Compression , 2020, IEEE Transactions on Image Processing.

[8]  Tingting Jiang,et al.  Unified Quality Assessment of in-the-Wild Videos with Mixed Datasets Training , 2020, Int. J. Comput. Vis..

[9]  Junyong You,et al.  Blind Natural Video Quality Prediction via Statistical Temporal Features and Deep Spatial Features , 2020, ACM Multimedia.

[10]  H. Song,et al.  A Decade Survey of Transfer Learning (2010–2020) , 2020, IEEE Transactions on Artificial Intelligence.

[11]  Morteza Khademi,et al.  No-Reference Video Quality Assessment Based on Visual Memory Modeling , 2020, IEEE Transactions on Broadcasting.

[12]  Leida Li,et al.  Active Inference of GAN for No-Reference Image Quality Assessment , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[13]  Guangming Shi,et al.  End-to-End Blind Image Quality Prediction With Cascaded Deep Neural Network , 2020, IEEE Transactions on Image Processing.

[14]  Sumohana S. Channappayya,et al.  No-Reference Video Quality Assessment Using Natural Spatiotemporal Scene Statistics , 2020, IEEE Transactions on Image Processing.

[15]  Sumohana S. Channappayya,et al.  Quality Aware Generative Adversarial Networks , 2019, NeurIPS.

[16]  Hui Xiong,et al.  A Comprehensive Survey on Transfer Learning , 2019, Proceedings of the IEEE.

[17]  Shijie Hao,et al.  No-Reference Image Quality Assessment Based on Multi-Task Generative Adversarial Network , 2019, IEEE Access.

[18]  Junyong You,et al.  Deep Neural Networks for No-Reference Video Quality Assessment , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[19]  Guizhong Liu,et al.  Bitrate-Based No-Reference Video Quality Assessment Combining the Visual Perception of Video Contents , 2019, IEEE Transactions on Broadcasting.

[20]  Zhangyang Wang,et al.  DeblurGAN-v2: Deblurring (Orders-of-Magnitude) Faster and Better , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Ming Jiang,et al.  Quality Assessment of In-the-Wild Videos , 2019, ACM Multimedia.

[22]  Jari Korhonen,et al.  Two-Level Approach for No-Reference Consumer Video Quality Assessment , 2019, IEEE Transactions on Image Processing.

[23]  Balu Adsumilli,et al.  YouTube UGC Dataset for Video Compression Research , 2019, 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP).

[24]  Zhengfang Duanmu,et al.  End-to-End Blind Quality Assessment of Compressed Videos Using Deep Neural Networks , 2018, ACM Multimedia.

[25]  Alan C. Bovik,et al.  In-Capture Mobile Video Distortions: A Study of Subjective Behavior and Objective Algorithms , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[26]  Dietmar Saupe,et al.  Deeprn: A Content Preserving Deep Architecture for Blind Image Quality Assessment , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[27]  Kwan-Yee Lin,et al.  Hallucinated-IQA: No-Reference Image Quality Assessment via Adversarial Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Fan Zhang,et al.  BVI-HD: A Video Quality Database for HEVC Compressed and Texture Synthesized Content , 2018, IEEE Transactions on Multimedia.

[29]  Bin Jiang,et al.  No Reference Quality Assessment of Stereo Video Based on Saliency and Sparsity , 2018, IEEE Transactions on Broadcasting.

[30]  Yizhou Wang,et al.  RAN4IQA: Restorative Adversarial Nets for No-Reference Image Quality Assessment , 2017, AAAI.

[31]  Dietmar Saupe,et al.  The Konstanz natural video database (KoNViD-1k) , 2017, 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX).

[32]  T. Jung,et al.  Improving EEG-Based Emotion Classification Using Conditional Transfer Learning , 2017, Front. Hum. Neurosci..

[33]  Ke Gu,et al.  Perceptual Reduced-Reference Visual Quality Assessment for Contrast Alteration , 2017, IEEE Transactions on Broadcasting.

[34]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Ping Wang,et al.  MCL-JCV: A JND-based H.264/AVC video quality assessment dataset , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[36]  Lai-Man Po,et al.  No-Reference Video Quality Assessment With 3D Shearlet Transform and Convolutional Neural Networks , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[37]  Xuelong Li,et al.  Spatiotemporal Statistics for Video Quality Assessment , 2016, IEEE Transactions on Image Processing.

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[40]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[41]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[42]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[43]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[44]  Christophe Charrier,et al.  Blind Prediction of Natural Video Quality , 2014, IEEE Transactions on Image Processing.

[45]  Nikolay N. Ponomarenko,et al.  Color image database TID2013: Peculiarities and preliminary results , 2013, European Workshop on Visual Information Processing (EUVIP).

[46]  Alan C. Bovik,et al.  Making a “Completely Blind” Image Quality Analyzer , 2013, IEEE Signal Processing Letters.

[47]  Tao Li,et al.  No-Reference Image Quality Assessment Based on SVM for Video Conferencing System , 2012 .

[48]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[49]  Alan C. Bovik,et al.  No-Reference Image Quality Assessment in the Spatial Domain , 2012, IEEE Transactions on Image Processing.

[50]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[51]  David S. Doermann,et al.  Unsupervised feature learning framework for no-reference image quality assessment , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Rosario El-Feghali,et al.  Video Quality Metric for Bit Rate Control via Joint Adjustment of Quantization and Frame Rate , 2007, IEEE Transactions on Broadcasting.

[54]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[55]  Fan Li,et al.  TTL-IQA: Transitive Transfer Learning Based No-Reference Image Quality Assessment , 2021, IEEE Transactions on Multimedia.

[56]  Mikko Nuutinen,et al.  CVD2014—A Database for Evaluating No-Reference Video Quality Assessment Algorithms , 2016, IEEE Transactions on Image Processing.

[57]  Alan C. Bovik,et al.  A Completely Blind Video Integrity Oracle , 2016, IEEE Transactions on Image Processing.