Spatiotemporal Feature Integration and Model Fusion for Full Reference Video Quality Assessment

The recently developed video multi-method assessment fusion (VMAF) framework integrates multiple quality-aware features to accurately predict the video quality. However, the VMAF does not yet exploit important principles of temporal perception that are relevant to the perceptual video distortion measurement. Here, we propose two improvements to the VMAF framework, called spatiotemporal VMAF and ensemble VMAF, which leverage perceptually-motivated space–time features that are efficiently calculated at multiple scales. We also conducted a large subjective video study, which we have found to be an excellent resource for training our feature-based approaches. In rigorous experiments, we found that the proposed algorithms demonstrate the state-of-the-art performance on multiple video applications. The compared algorithms will be made available as a part of the open source package in https://github.com/Netflix/vmaf.

[1]  Rajiv Soundararajan,et al.  Study of Subjective and Objective Quality Assessment of Video , 2010, IEEE Transactions on Image Processing.

[2]  Gustavo de Veciana,et al.  Video Quality Assessment on Mobile Devices: Subjective, Behavioral and Objective Studies , 2012, IEEE Journal of Selected Topics in Signal Processing.

[3]  C.-C. Jay Kuo,et al.  A fusion-based video quality assessment (fvqa) index , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[4]  A. Bovik A VISUAL INFORMATION FIDELITY APPROACH TO VIDEO QUALITY ASSESSMENT , 2005 .

[5]  Alan C. Bovik,et al.  Temporal hysteresis model of time varying subjective video quality , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Rajiv Soundararajan,et al.  Video Quality Assessment by Reduced Reference Spatio-Temporal Entropic Differencing , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  T. Bouden,et al.  No reference image quality assessment: Feature fusion using relevance vector machine , 2017, 2017 5th International Conference on Electrical Engineering - Boumerdes (ICEE-B).

[8]  Jacob Søgaard,et al.  No-Reference Video Quality Assessment Using Codec Analysis , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Zhengfang Duanmu,et al.  A Quality-of-Experience Index for Streaming Video , 2017, IEEE Journal of Selected Topics in Signal Processing.

[10]  Anne Aaron,et al.  A large-scale video codec comparison of x264, x265 and libvpx for practical VOD applications , 2016, Optical Engineering + Applications.

[11]  Eric C. Larson,et al.  Most apparent distortion: full-reference image quality assessment and the role of strategy , 2010, J. Electronic Imaging.

[12]  Gary J. Sullivan,et al.  Video Quality Evaluation Methodology and Verification Testing of HEVC Compression Performance , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Alan C. Bovik,et al.  No-Reference Image Quality Assessment in the Spatial Domain , 2012, IEEE Transactions on Image Processing.

[14]  Xin Jin,et al.  VideoSet: A large-scale compressed video quality dataset based on JND measurement , 2017, J. Vis. Commun. Image Represent..

[15]  Praful Gupta,et al.  SpEED-QA: Spatial Efficient Entropic Differencing for Image and Video Quality , 2017, IEEE Signal Processing Letters.

[16]  Zhou Wang,et al.  Reduced-reference image quality assessment using a wavelet-domain natural image statistic model , 2005, IS&T/SPIE Electronic Imaging.

[17]  Damon M. Chandler,et al.  A spatiotemporal most-apparent-distortion model for video quality assessment , 2011, 2011 18th IEEE International Conference on Image Processing.

[18]  Alan Conrad Bovik,et al.  Study of Temporal Effects on Subjective Video Quality of Experience , 2017, IEEE Transactions on Image Processing.

[19]  Yaowu Chen,et al.  No-reference video quality assessment in the compressed domain , 2012, IEEE Transactions on Consumer Electronics.

[20]  Damon M. Chandler,et al.  ViS3: an algorithm for video quality assessment via analysis of spatial and spatiotemporal slices , 2014, J. Electronic Imaging.

[21]  Peter Schelkens,et al.  Qualinet White Paper on Definitions of Quality of Experience , 2013 .

[22]  Alan C. Bovik,et al.  Recurrent and Dynamic Models for Predicting Streaming Video Quality of Experience , 2018, IEEE Transactions on Image Processing.

[23]  Lei Zhang,et al.  A Feature-Enriched Completely Blind Image Quality Evaluator , 2015, IEEE Transactions on Image Processing.

[24]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[25]  Stefano Tubaro,et al.  A H.264/AVC video database for the evaluation of quality metrics , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Margaret H. Pinson,et al.  Temporal Video Quality Model Accounting for Variable Frame Delay Distortions , 2014, IEEE Transactions on Broadcasting.

[27]  David S. Doermann,et al.  No-reference video quality assessment via feature learning , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[28]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[29]  Eirini Liotou,et al.  No-reference video quality measurement: added value of machine learning , 2015, J. Electronic Imaging.

[30]  Kai Zeng,et al.  SSIM-Motivated Two-Pass VBR Coding for HEVC , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[31]  Fan Zhang,et al.  Reduced-Reference Image Quality Assessment Using Reorganized DCT-Based Image Representation , 2011, IEEE Transactions on Multimedia.

[32]  Margaret H. Pinson,et al.  A new standardized method for objectively measuring video quality , 2004, IEEE Transactions on Broadcasting.

[33]  D J Field,et al.  Relations between the statistics of natural images and the response properties of cortical cells. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[34]  Alan C. Bovik,et al.  Motion Tuned Spatio-Temporal Quality Assessment of Natural Videos , 2010, IEEE Transactions on Image Processing.

[35]  Mohamed A. Deriche,et al.  A Reduced Reference Image Quality Metric based on feature fusion and neural networks , 2011, 2011 19th European Signal Processing Conference.

[36]  J. Astola,et al.  ON BETWEEN-COEFFICIENT CONTRAST MASKING OF DCT BASIS FUNCTIONS , 2007 .

[37]  J. Movshon,et al.  Linearity and Normalization in Simple Cells of the Macaque Primary Visual Cortex , 1997, The Journal of Neuroscience.

[38]  Gustavo de Veciana,et al.  Modeling the Time—Varying Subjective Quality of HTTP Video Streams With Rate Adaptations , 2013, IEEE Transactions on Image Processing.

[39]  Zhou Wang,et al.  Video quality assessment based on structural distortion measurement , 2004, Signal Process. Image Commun..

[40]  Stefan Winkler,et al.  Analysis of Public Image and Video Databases for Quality Assessment , 2012, IEEE Journal of Selected Topics in Signal Processing.

[41]  Christian Viard-Gaudin,et al.  A Convolutional Neural Network Approach for Objective Video Quality Assessment , 2006, IEEE Transactions on Neural Networks.

[42]  Fan Zhang,et al.  Image Quality Assessment by Separately Evaluating Detail Losses and Additive Impairments , 2011, IEEE Transactions on Multimedia.

[43]  Michael J. Burke,et al.  Averaging Correlations: Expected Values and Bias in Combined Pearson rs and Fisher's z Transformations , 1998 .

[44]  Alan C. Bovik,et al.  Image information and visual quality , 2006, IEEE Trans. Image Process..

[45]  Xuelong Li,et al.  Spatiotemporal Statistics for Video Quality Assessment , 2016, IEEE Transactions on Image Processing.

[46]  C.-C. Jay Kuo,et al.  EVQA: An ensemble-learning-based video quality assessment index , 2015, 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[47]  Zhou Wang,et al.  Reduced-Reference Image Quality Assessment Using Divisive Normalization-Based Image Representation , 2009, IEEE Journal of Selected Topics in Signal Processing.

[48]  Christophe Charrier,et al.  Blind Prediction of Natural Video Quality , 2014, IEEE Transactions on Image Processing.

[49]  Martin J. Wainwright,et al.  Scale Mixtures of Gaussians and the Statistics of Natural Images , 1999, NIPS.

[50]  Alan C. Bovik,et al.  Learning to Predict Streaming Video QoE: Distortions, Rebuffering and Memory , 2017, ArXiv.

[51]  Alan C. Bovik,et al.  A Completely Blind Video Integrity Oracle , 2016, IEEE Transactions on Image Processing.

[52]  Martin Reisslein,et al.  Objective Video Quality Assessment Methods: A Classification, Review, and Performance Comparison , 2011, IEEE Transactions on Broadcasting.

[53]  Anil C. Kokaram,et al.  A Perceptual Quality Metric for Videos Distorted by Spatially Correlated Noise , 2016, ACM Multimedia.

[54]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[55]  Abdul Rehman,et al.  Reduced-Reference Image Quality Assessment by Structural Similarity Estimation , 2012, IEEE Transactions on Image Processing.

[56]  Martin J. Wainwright,et al.  Image denoising using scale mixtures of Gaussians in the wavelet domain , 2003, IEEE Trans. Image Process..

[57]  Alan C. Bovik,et al.  Continuous Prediction of Streaming Video QoE Using Dynamic Networks , 2017, IEEE Signal Processing Letters.

[58]  D. Ruderman The statistics of natural images , 1994 .

[59]  Vijayan K. Asari,et al.  A no-reference video quality assessment based on Laplacian pyramids , 2013, 2013 IEEE International Conference on Image Processing.

[60]  King Ngi Ngan,et al.  Blind Image Quality Assessment Based on Multichannel Feature Fusion and Label Transfer , 2016, IEEE Transactions on Circuits and Systems for Video Technology.