Unified Quality Assessment of in-the-Wild Videos with Mixed Datasets Training

Video quality assessment (VQA) is an important problem in computer vision. The videos in computer vision applications are usually captured in the wild. We focus on automatically assessing the quality of in-the-wild videos, which is a challenging problem due to the absence of reference videos, the complexity of distortions, and the diversity of video contents. Moreover, the video contents and distortions among existing datasets are quite different, which leads to poor performance of data-driven methods in the cross-dataset evaluation setting. To improve the performance of quality assessment models, we borrow intuitions from human perception, specifically, content dependency and temporal-memory effects of human visual system. To face the cross-dataset evaluation challenge, we explore a mixed datasets training strategy for training a single VQA model with multiple datasets. The proposed unified framework explicitly includes three stages: relative quality assessor, nonlinear mapping, and dataset-specific perceptual scale alignment, to jointly predict relative quality, perceptual quality, and subjective quality. Experiments are conducted on four publicly available datasets for VQA in the wild, i.e. , LIVE-VQC, LIVE-Qualcomm, KoNViD-1k, and CVD2014. The experimental results verify the effectiveness of the mixed datasets training strategy and prove the superior performance of the unified model in comparison with the state-of-the-art models. For reproducible research, we make the PyTorch implementation of our method available at https://github.com/lidq92/MDTVSFA .

[1]  Mikko Nuutinen,et al.  CVD2014—A Database for Evaluating No-Reference Video Quality Assessment Algorithms , 2016, IEEE Transactions on Image Processing.

[2]  Xinbo Gao,et al.  A spatiotemporal model of video quality assessment via 3D gradient differencing , 2019, Inf. Sci..

[3]  Yu Qiao,et al.  RankSRGAN: Generative Adversarial Networks With Ranker for Image Super-Resolution , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Wei Zhang,et al.  Study of Saliency in Objective Video Quality Assessment , 2017, IEEE Transactions on Image Processing.

[5]  Steve Branson,et al.  Learned Video Compression , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Alan C. Bovik,et al.  Video Quality Pooling Adaptive to Perceptual Distortion Severity , 2013, IEEE Transactions on Image Processing.

[7]  Tingting Jiang,et al.  Norm-in-Norm Loss with Faster Convergence and Better Performance for Image Quality Assessment , 2020, ACM Multimedia.

[8]  Touradj Ebrahimi,et al.  Attention Driven Foveated Video Quality Assessment , 2014, IEEE Transactions on Image Processing.

[9]  Kosuke Sato,et al.  Which is the Better Inpainted Image?Training Data Generation Without Any Manual Operations , 2018, International Journal of Computer Vision.

[10]  Juan Pedro López Velasco Video Quality Assessment , 2012 .

[11]  Zhengfang Duanmu,et al.  End-to-End Blind Quality Assessment of Compressed Videos Using Deep Neural Networks , 2018, ACM Multimedia.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Dietmar Saupe,et al.  Empirical evaluation of no-reference VQA methods on a natural video quality database , 2017, 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX).

[14]  Domonkos Varga,et al.  No-Reference Video Quality Assessment Based on the Temporal Pooling of Deep Features , 2019, Neural Processing Letters.

[15]  Lina J. Karam,et al.  Understanding how image quality affects deep neural networks , 2016, 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX).

[16]  Deep Medhi,et al.  Measurement of Quality of Experience of Video-on-Demand Services: A Survey , 2016, IEEE Communications Surveys & Tutorials.

[17]  Christophe Charrier,et al.  Blind Prediction of Natural Video Quality , 2014, IEEE Transactions on Image Processing.

[18]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[19]  Alan C. Bovik,et al.  No-Reference Image Quality Assessment in the Spatial Domain , 2012, IEEE Transactions on Image Processing.

[20]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[21]  Alan C. Bovik,et al.  Perceptual quality prediction on authentically distorted images using a bag of features approach , 2016, Journal of vision.

[22]  Alan C. Bovik,et al.  Motion Tuned Spatio-Temporal Quality Assessment of Natural Videos , 2010, IEEE Transactions on Image Processing.

[23]  Judith Redi,et al.  Semantic-aware blind image quality assessment , 2018, Signal Process. Image Commun..

[24]  Alan C. Bovik,et al.  In-Capture Mobile Video Distortions: A Study of Subjective Behavior and Objective Algorithms , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Xuelong Li,et al.  Spatiotemporal Statistics for Video Quality Assessment. , 2016, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[26]  Xin Jin,et al.  VideoSet: A large-scale compressed video quality dataset based on JND measurement , 2017, J. Vis. Commun. Image Represent..

[27]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[28]  Yoann Baveye,et al.  Training Objective Image and Video Quality Estimators Using Multiple Databases , 2020, IEEE Transactions on Multimedia.

[29]  Yu-Chiang Frank Wang,et al.  Cross-Dataset Person Re-Identification via Unsupervised Pose Disentanglement and Adaptation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Zhou Wang,et al.  Group MAD Competition? A New Methodology to Compare Objective Image Quality Models , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Alan C. Bovik,et al.  Spatio-Temporal Measures Of Naturalness , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[32]  Joost van de Weijer,et al.  RankIQA: Learning from Rankings for No-Reference Image Quality Assessment , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[34]  Lai-Man Po,et al.  No-Reference Video Quality Assessment With 3D Shearlet Transform and Convolutional Neural Networks , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[35]  Alan C. Bovik,et al.  Temporal hysteresis model of time varying subjective video quality , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  Jonathan T. Barron,et al.  A General and Adaptive Robust Loss Function , 2017, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Xiaokang Yang,et al.  Learning To Blindly Assess Image Quality In The Laboratory And Wild , 2019, 2020 IEEE International Conference on Image Processing (ICIP).

[38]  Ming Jiang,et al.  Quality Assessment of In-the-Wild Videos , 2019, ACM Multimedia.

[39]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Yi Li,et al.  Convolutional Neural Networks for No-Reference Image Quality Assessment , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Weisi Lin,et al.  Which Has Better Visual Quality: The Clear Blue Sky or a Blurry Animal? , 2019, IEEE Transactions on Multimedia.

[42]  Ivan Mauricio Cabezas,et al.  How Video Object Tracking Is Affected by In-capture Distortions? , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[43]  Alan C. Bovik,et al.  A Completely Blind Video Integrity Oracle , 2016, IEEE Transactions on Image Processing.

[44]  Gustavo de Veciana,et al.  Video Quality Assessment on Mobile Devices: Subjective, Behavioral and Objective Studies , 2012, IEEE Journal of Selected Topics in Signal Processing.

[45]  Alan C. Bovik,et al.  Video quality assessment accounting for temporal visual masking of local flicker , 2018, Signal Process. Image Commun..

[46]  Can Yang,et al.  Unsupervised Cross-Dataset Person Re-identification by Transfer Learning of Spatial-Temporal Patterns , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Zhou Wang,et al.  Video quality assessment based on structural distortion measurement , 2004, Signal Process. Image Commun..

[48]  Xinbo Gao,et al.  Blind Video Quality Assessment With Weakly Supervised Learning and Resampling Strategy , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[49]  Wen Gao,et al.  Novel Spatio-Temporal Structural Information Based Video Quality Metric , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[50]  Rajiv Soundararajan,et al.  Study of Subjective and Objective Quality Assessment of Video , 2010, IEEE Transactions on Image Processing.

[51]  Alan Conrad Bovik,et al.  Large-Scale Study of Perceptual Video Quality , 2018, IEEE Transactions on Image Processing.

[52]  Dietmar Saupe,et al.  The Konstanz natural video database (KoNViD-1k) , 2017, 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX).

[53]  Sumohana S. Channappayya,et al.  An optical flow-based no-reference video quality assessment algorithm , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[54]  Jari Korhonen,et al.  Two-Level Approach for No-Reference Consumer Video Quality Assessment , 2019, IEEE Transactions on Image Processing.

[55]  Hongyu Li,et al.  VSI: A Visual Saliency-Induced Index for Perceptual Image Quality Assessment , 2014, IEEE Transactions on Image Processing.

[56]  Sophie Triantaphillidou,et al.  Image quality comparison between JPEG and JPEG2000. II. Scene dependency, scene analysis, and classification , 2007 .

[57]  David S. Doermann,et al.  No-reference video quality assessment via feature learning , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[58]  Xuelong Li,et al.  Spatiotemporal Statistics for Video Quality Assessment , 2016, IEEE Transactions on Image Processing.

[59]  Xinbo Gao,et al.  Objective Video Quality Assessment Combining Transfer Learning With CNN , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[60]  Dietmar Saupe,et al.  Spatiotemporal Feature Combination Model for No-Reference Video Quality Assessment , 2018, 2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX).

[61]  Konrad Schindler,et al.  Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Peng Yan,et al.  No-reference video quality assessment based on spatiotemporal slice images and deep convolutional neural networks , 2019, SPIE/COS Photonics Asia.

[63]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[64]  Alan Conrad Bovik,et al.  Study of Temporal Effects on Subjective Video Quality of Experience , 2017, IEEE Transactions on Image Processing.

[65]  Kwan-Yee Lin,et al.  Hallucinated-IQA: No-Reference Image Quality Assessment via Adversarial Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[66]  Domonkos Varga,et al.  No-reference video quality assessment via pretrained CNN and LSTM networks , 2019, Signal Image Video Process..

[67]  Mylène C. Q. Farias,et al.  Using multiple spatio-temporal features to estimate video quality , 2018, Signal Process. Image Commun..

[68]  David S. Doermann,et al.  Unsupervised feature learning framework for no-reference image quality assessment , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  Zhengfang Duanmu,et al.  Geometric Transformation Invariant Image Quality Assessment Using Convolutional Neural Networks , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[70]  Phuoc Tran-Gia,et al.  A Survey on Quality of Experience of HTTP Adaptive Streaming , 2015, IEEE Communications Surveys & Tutorials.

[71]  Jinwoo Kim,et al.  Deep Video Quality Assessor: From Spatio-Temporal Visual Sensitivity to a Convolutional Neural Aggregation Network , 2018, ECCV.

[72]  Alan C. Bovik,et al.  Making a “Completely Blind” Image Quality Analyzer , 2013, IEEE Signal Processing Letters.

[73]  Junyong You,et al.  Deep Neural Networks for No-Reference Video Quality Assessment , 2019, 2019 IEEE International Conference on Image Processing (ICIP).