ChipQA: No-Reference Video Quality Prediction via Space-Time Chips

We propose a new model for no-reference video quality assessment (VQA). Our approach uses a new idea of highly-localized space-time (ST) slices called Space-Time Chips (ST Chips). ST Chips are localized cuts of video data along directions that implicitly capture motion. We use perceptually-motivated bandpass and normalization models to first process the video data, and then select oriented ST Chips based on how closely they fit parametric models of natural video statistics. We show that the parameters that describe these statistics can be used to reliably predict the quality of videos, without the need for a reference video. The proposed method implicitly models ST video naturalness, and deviations from naturalness. We train and test our model on several large VQA databases, and show that our model achieves state-of-the-art performance at reduced cost, without requiring motion computation.

[1]  Alan C. Bovik,et al.  Assessment of Subjective and Objective Quality of Live Streaming Sports Videos , 2021, 2021 Picture Coding Symposium (PCS).

[2]  Alan C. Bovik,et al.  A Completely Blind Video Integrity Oracle , 2016, IEEE Transactions on Image Processing.

[3]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[4]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  G. Marsaglia,et al.  Evaluating the Anderson-Darling Distribution , 2004 .

[6]  Dietmar Saupe,et al.  No-Reference Video Quality Assessment using Multi-Level Spatially Pooled Features , 2019, ArXiv.

[7]  Sumohana S. Channappayya,et al.  An optical flow-based no-reference video quality assessment algorithm , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[8]  D. Ruderman The statistics of natural images , 1994 .

[9]  Jari Korhonen,et al.  Two-Level Approach for No-Reference Consumer Video Quality Assessment , 2019, IEEE Transactions on Image Processing.

[10]  Alan C. Bovik,et al.  A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment , 2020, 2020 IEEE International Conference on Image Processing (ICIP).

[11]  Alan Conrad Bovik,et al.  Large-Scale Study of Perceptual Video Quality , 2018, IEEE Transactions on Image Processing.

[12]  Alberto Leon-Garcia,et al.  Estimation of shape parameter for generalized Gaussian distributions in subband decompositions of video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[13]  Xuanqin Mou,et al.  Video quality assessment via gradient magnitude similarity deviation of spatial and spatiotemporal slices , 2015, Electronic Imaging.

[14]  Alan C. Bovik,et al.  A Statistical Evaluation of Recent Full Reference Image Quality Assessment Algorithms , 2006, IEEE Transactions on Image Processing.

[15]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  B. Iglewicz,et al.  A test for departure from normality based on a biweight estimator of scale , 1981 .

[17]  Alan C. Bovik,et al.  RAPIQUE: Rapid and Accurate Video Quality Prediction of User Generated Content , 2021, IEEE Open Journal of Signal Processing.

[18]  Junyong You,et al.  Blind Natural Video Quality Prediction via Statistical Temporal Features and Deep Spatial Features , 2020, ACM Multimedia.

[19]  Gunnar Farnebäck,et al.  Very High Accuracy Velocity Estimation using Orientation Tensors Parametric Motion and Simultaneous Segmentation of the Motion Field , 2001, ICCV.

[20]  Bing Zeng,et al.  A new three-step search algorithm for block motion estimation , 1994, IEEE Trans. Circuits Syst. Video Technol..

[21]  Balu Adsumilli,et al.  YouTube UGC Dataset for Video Compression Research , 2019, 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP).

[22]  Christophe Charrier,et al.  Blind Prediction of Natural Video Quality , 2014, IEEE Transactions on Image Processing.

[23]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[24]  Lei Zhang,et al.  Gradient Magnitude Similarity Deviation: A Highly Efficient Perceptual Image Quality Index , 2013, IEEE Transactions on Image Processing.

[25]  Dietmar Saupe,et al.  The Konstanz natural video database (KoNViD-1k) , 2017, 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX).

[26]  Alan C. Bovik,et al.  No-Reference Image Quality Assessment in the Spatial Domain , 2012, IEEE Transactions on Image Processing.

[27]  J. Schanda,et al.  Colorimetry : understanding the CIE system , 2007 .

[28]  Alan C. Bovik,et al.  Making a “Completely Blind” Image Quality Analyzer , 2013, IEEE Signal Processing Letters.

[29]  Eero P. Simoncelli,et al.  A model of neuronal responses in visual area MT , 1998, Vision Research.

[30]  David S. Doermann,et al.  Unsupervised feature learning framework for no-reference image quality assessment , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Joshua Peter Ebenezer,et al.  No-Reference Video Quality Assessment Using Space-Time Chips , 2020, 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP).

[32]  Damon M. Chandler,et al.  ViS3: an algorithm for video quality assessment via analysis of spatial and spatiotemporal slices , 2014, J. Electronic Imaging.

[33]  Alan C. Bovik,et al.  Motion Tuned Spatio-Temporal Quality Assessment of Natural Videos , 2010, IEEE Transactions on Image Processing.

[34]  Alan C. Bovik,et al.  No-Reference Quality Assessment of Tone-Mapped HDR Pictures , 2017, IEEE Transactions on Image Processing.

[35]  Tingting Jiang,et al.  Unified Quality Assessment of in-the-Wild Videos with Mixed Datasets Training , 2021, Int. J. Comput. Vis..

[36]  Xuanqin Mou,et al.  Video quality assessment based on motion structure partition similarity of spatiotemporal slice images , 2018, J. Electronic Imaging.

[37]  J. Atick,et al.  Temporal decorrelation: a theory of lagged and nonlagged responses in the lateral geniculate nucleus , 1995 .

[38]  A J Ahumada,et al.  Model of human visual-motion sensing. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[39]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[40]  Jongho Kim,et al.  A Subjective and Objective Study of Space-Time Subsampled Video Quality , 2021, IEEE Transactions on Image Processing.

[41]  E H Adelson,et al.  Spatiotemporal energy models for the perception of motion. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[42]  Eero P. Simoncelli,et al.  Natural image statistics and divisive normalization: Modeling nonlinearity and adaptation in cortical neurons , 2002 .

[43]  S. Shapiro,et al.  An Approximate Analysis of Variance Test for Normality , 1972 .

[44]  Sumohana S. Channappayya,et al.  No-Reference Video Quality Assessment Using Natural Spatiotemporal Scene Statistics , 2020, IEEE Transactions on Image Processing.

[45]  R. D'Agostino,et al.  A Suggestion for Using Powerful and Informative Tests of Normality , 1990 .

[46]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[47]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[48]  W. Beyer CRC Standard Probability And Statistics Tables and Formulae , 1990 .

[49]  Alan C. Bovik,et al.  Video quality assessment using space-time slice mappings , 2020, Signal Process. Image Commun..

[50]  Alan C. Bovik,et al.  Perceptual quality prediction on authentically distorted images using a bag of features approach , 2016, Journal of vision.

[51]  Michael J. Black,et al.  Robust dynamic motion estimation over time , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[52]  Alan C. Bovik,et al.  A Structural Similarity Metric for Video Based on Motion Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[53]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[54]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.