Enhancing Temporal Quality Measurements in a Globally Deployed Streaming Video Quality Predictor

Most successful perceptual video quality assessment models are either frame-based, or perform spatiotemporal filtering or motion estimation to model the temporal aspects of video distortions. While good results are obtained on video quality databases, their increased computational complexity often causes video quality engineers to instead rely on simpler image-based quality algorithms. Towards balancing demands between prediction accuracy and compute efficiency, Netflix developed the Video Multi-method Assessment Fusion (VMAF) Framework, an efficient feature-based system that combines multiple perception-based elementary image measurements to produce video quality predictions. However, the current version of VMAF only weakly captures temporal video features which are sensitive to perceptual temporal video distortions. To this end, we propose an enhanced model we call SpatioTemporal VMAF (ST- VMAF) that incorporates temporal features that are easy to compute. We demonstrate the improved performance of ST- VMAF on many subjective video databases. The proposed model will be made available as part of the open source package in https://github.com/Netflix/vmaf.

[1]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[2]  Alan C. Bovik,et al.  Making a “Completely Blind” Image Quality Analyzer , 2013, IEEE Signal Processing Letters.

[3]  Martin J. Wainwright,et al.  Image denoising using scale mixtures of Gaussians in the wavelet domain , 2003, IEEE Trans. Image Process..

[4]  Alan C. Bovik,et al.  Continuous Prediction of Streaming Video QoE Using Dynamic Networks , 2017, IEEE Signal Processing Letters.

[5]  Fan Zhang,et al.  Image Quality Assessment by Separately Evaluating Detail Losses and Additive Impairments , 2011, IEEE Transactions on Multimedia.

[6]  Michael J. Burke,et al.  Averaging Correlations: Expected Values and Bias in Combined Pearson rs and Fisher's z Transformations , 1998 .

[7]  Kai Zeng,et al.  SSIM-Motivated Two-Pass VBR Coding for HEVC , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Alan C. Bovik,et al.  Image information and visual quality , 2006, IEEE Trans. Image Process..

[9]  Xuelong Li,et al.  Spatiotemporal Statistics for Video Quality Assessment , 2016, IEEE Transactions on Image Processing.

[10]  Margaret H. Pinson,et al.  Temporal Video Quality Model Accounting for Variable Frame Delay Distortions , 2014, IEEE Transactions on Broadcasting.

[11]  Alan C. Bovik,et al.  Predicting Encoded Picture Quality in Two Steps is a Better Way , 2018, 1801.02016.

[12]  Zhou Wang,et al.  Reduced-Reference Image Quality Assessment Using Divisive Normalization-Based Image Representation , 2009, IEEE Journal of Selected Topics in Signal Processing.

[13]  Christophe Charrier,et al.  Blind Prediction of Natural Video Quality , 2014, IEEE Transactions on Image Processing.

[14]  Praful Gupta,et al.  SpEED-QA: Spatial Efficient Entropic Differencing for Image and Video Quality , 2017, IEEE Signal Processing Letters.

[15]  Zhou Wang,et al.  Reduced-reference image quality assessment using a wavelet-domain natural image statistic model , 2005, IS&T/SPIE Electronic Imaging.

[16]  Damon M. Chandler,et al.  A spatiotemporal most-apparent-distortion model for video quality assessment , 2011, 2011 18th IEEE International Conference on Image Processing.

[17]  Alan C. Bovik,et al.  Automatic Prediction of Perceptual Image and Video Quality , 2013, Proceedings of the IEEE.

[18]  Zhou Wang,et al.  Video quality assessment using a statistical model of human visual speed perception. , 2007, Journal of the Optical Society of America. A, Optics, image science, and vision.

[19]  Martin J. Wainwright,et al.  Scale Mixtures of Gaussians and the Statistics of Natural Images , 1999, NIPS.

[20]  Rajiv Soundararajan,et al.  Study of Subjective and Objective Quality Assessment of Video , 2010, IEEE Transactions on Image Processing.

[21]  Alan C. Bovik,et al.  Feature-based prediction of streaming video QoE: Distortions, stalling and memory , 2018, Signal Process. Image Commun..

[22]  Gustavo de Veciana,et al.  Video Quality Assessment on Mobile Devices: Subjective, Behavioral and Objective Studies , 2012, IEEE Journal of Selected Topics in Signal Processing.

[23]  Anne Aaron,et al.  A large-scale video codec comparison of x264, x265 and libvpx for practical VOD applications , 2016, Optical Engineering + Applications.

[24]  Eric C. Larson,et al.  Most apparent distortion: full-reference image quality assessment and the role of strategy , 2010, J. Electronic Imaging.

[25]  Damon M. Chandler,et al.  ViS3: an algorithm for video quality assessment via analysis of spatial and spatiotemporal slices , 2014, J. Electronic Imaging.

[26]  Alan C. Bovik,et al.  Experiments in segmenting texton patterns using localized spatial filters , 1989, Pattern Recognit..

[27]  Rajiv Soundararajan,et al.  Video Quality Assessment by Reduced Reference Spatio-Temporal Entropic Differencing , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[28]  Zhengfang Duanmu,et al.  A Quality-of-Experience Index for Streaming Video , 2017, IEEE Journal of Selected Topics in Signal Processing.

[29]  Alan C. Bovik,et al.  Theory of order statistic filters and their relationship to linear FIR filters , 1989, IEEE Trans. Acoust. Speech Signal Process..

[30]  A. Bovik A VISUAL INFORMATION FIDELITY APPROACH TO VIDEO QUALITY ASSESSMENT , 2005 .

[31]  Alan C. Bovik,et al.  Temporal hysteresis model of time varying subjective video quality , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[32]  Alan C. Bovik,et al.  No-Reference Image Quality Assessment in the Spatial Domain , 2012, IEEE Transactions on Image Processing.

[33]  Stefano Tubaro,et al.  A H.264/AVC video database for the evaluation of quality metrics , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[34]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[35]  J. Astola,et al.  ON BETWEEN-COEFFICIENT CONTRAST MASKING OF DCT BASIS FUNCTIONS , 2007 .

[36]  David Zhang,et al.  FSIM: A Feature Similarity Index for Image Quality Assessment , 2011, IEEE Transactions on Image Processing.

[37]  Alan C. Bovik,et al.  A Completely Blind Video Integrity Oracle , 2016, IEEE Transactions on Image Processing.

[38]  Alan C. Bovik,et al.  RRED Indices: Reduced Reference Entropic Differencing for Image Quality Assessment , 2012, IEEE Transactions on Image Processing.

[39]  Alan C. Bovik,et al.  Motion Tuned Spatio-Temporal Quality Assessment of Natural Videos , 2010, IEEE Transactions on Image Processing.

[40]  Lei Zhang,et al.  A Feature-Enriched Completely Blind Image Quality Evaluator , 2015, IEEE Transactions on Image Processing.