No-Reference Video Quality Assessment Using Natural Spatiotemporal Scene Statistics

Robust spatiotemporal representations of natural videos have several applications including quality assessment, action recognition, object tracking etc. In this paper, we propose a video representation that is based on a parameterized statistical model for the spatiotemporal statistics of mean subtracted and contrast normalized (MSCN) coefficients of natural videos. Specifically, we propose an asymmetric generalized Gaussian distribution (AGGD) to model the statistics of MSCN coefficients of natural videos and their spatiotemporal Gabor bandpass filtered outputs. We then demonstrate that the AGGD model parameters serve as good representative features for distortion discrimination. Based on this observation, we propose a supervised learning approach using support vector regression (SVR) to address the no-reference video quality assessment (NRVQA) problem. The performance of the proposed algorithm is evaluated on publicly available video quality assessment (VQA) datasets with both traditional and in-capture/authentic distortions. We show that the proposed algorithm delivers competitive performance on traditional (synthetic) distortions and acceptable performance on authentic distortions. The code for our algorithm will be released at https://www.iith.ac.in/lfovia/downloads.html.

[1]  Xinbo Gao,et al.  Blind Video Quality Assessment With Weakly Supervised Learning and Resampling Strategy , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Rajiv Soundararajan,et al.  Study of Subjective and Objective Quality Assessment of Video , 2010, IEEE Transactions on Image Processing.

[3]  Zhou Wang,et al.  Video Denoising Based on a Spatiotemporal Gaussian Scale Mixture Model , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[5]  Tiago Rosa Maria Paula Queluz,et al.  No-Reference Quality Assessment of H.264/AVC Encoded Video , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[7]  Alan C. Bovik,et al.  Making a “Completely Blind” Image Quality Analyzer , 2013, IEEE Signal Processing Letters.

[8]  Alan C. Bovik,et al.  In-Capture Mobile Video Distortions: A Study of Subjective Behavior and Objective Algorithms , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  S. Rimac-Drlje,et al.  ECVQ and EVVQ video quality databases , 2012, Proceedings ELMAR-2012.

[10]  Junyong You,et al.  Deep Neural Networks for No-Reference Video Quality Assessment , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[11]  Gustavo de Veciana,et al.  Video Quality Assessment on Mobile Devices: Subjective, Behavioral and Objective Studies , 2012, IEEE Journal of Selected Topics in Signal Processing.

[12]  Constance S. Royden,et al.  Motion perception , 1998 .

[13]  A. Bovik A VISUAL INFORMATION FIDELITY APPROACH TO VIDEO QUALITY ASSESSMENT , 2005 .

[14]  Alan C. Bovik,et al.  No-Reference Image Quality Assessment in the Spatial Domain , 2012, IEEE Transactions on Image Processing.

[15]  David J. Heeger,et al.  Optical flow using spatiotemporal filters , 2004, International Journal of Computer Vision.

[16]  J. Atick,et al.  STATISTICS OF NATURAL TIME-VARYING IMAGES , 1995 .

[17]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[18]  Lei Zhang,et al.  Learning without Human Scores for Blind Image Quality Assessment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Kai Zeng,et al.  Temporal motion smoothness measurement for reduced-reference video quality assessment , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Zhengfang Duanmu,et al.  End-to-End Blind Quality Assessment of Compressed Videos Using Deep Neural Networks , 2018, ACM Multimedia.

[21]  Alan C. Bovik,et al.  C-DIIVINE: No-reference image quality assessment based on local magnitude and phase statistics of natural scenes , 2014, Signal Process. Image Commun..

[22]  Bruno A. Olshausen,et al.  Learning sparse, overcomplete representations of time-varying natural images , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[23]  Jari Korhonen,et al.  Two-Level Approach for No-Reference Consumer Video Quality Assessment , 2019, IEEE Transactions on Image Processing.

[24]  Alan C. Bovik,et al.  Feature maps driven no-reference image quality prediction of authentically distorted images , 2015, Electronic Imaging.

[25]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[26]  Sumohana S. Channappayya,et al.  Modeling sparse spatio-temporal representations for no-reference video quality assessment , 2017, 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[27]  Christophe Charrier,et al.  Blind Prediction of Natural Video Quality , 2014, IEEE Transactions on Image Processing.

[28]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[29]  Alan C. Bovik,et al.  Blind/Referenceless Image Spatial Quality Evaluator , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[30]  Antonio Liotta,et al.  Deep Learning for Quality Assessment in Live Video Streaming , 2017, IEEE Signal Processing Letters.

[31]  Stefano Tubaro,et al.  A H.264/AVC video database for the evaluation of quality metrics , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Margaret H. Pinson,et al.  Temporal Video Quality Model Accounting for Variable Frame Delay Distortions , 2014, IEEE Transactions on Broadcasting.

[33]  Nicolai Petkov,et al.  Suppression of contour perception by band-limited noise and its relation to nonclassical receptive field inhibition , 2003, Biological cybernetics.

[34]  Yannick Berthoumieu,et al.  Multiscale skewed heavy tailed model for texture analysis , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[35]  Rajiv Soundararajan,et al.  Video Quality Assessment by Reduced Reference Spatio-Temporal Entropic Differencing , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[36]  Nicolai Petkov,et al.  Nonlinear operator for oriented texture , 1999, IEEE Trans. Image Process..

[37]  A. Bovik,et al.  AN INFORMATION THEORETIC VIDEO QUALITY METRIC BASED ON MOTION MODELS , 2007 .

[38]  Jianqin Zhou,et al.  On discrete cosine transform , 2011, ArXiv.

[39]  David Zhang,et al.  FSIM: A Feature Similarity Index for Image Quality Assessment , 2011, IEEE Transactions on Image Processing.

[40]  N. Petkov,et al.  Motion detection, noise reduction, texture suppression, and contour enhancement by spatiotemporal Gabor filters with surround inhibition , 2007, Biological Cybernetics.

[41]  Xuelong Li,et al.  Spatiotemporal Statistics for Video Quality Assessment , 2016, IEEE Transactions on Image Processing.

[42]  Ljiljana Platisa,et al.  A full reference video quality measure based on motion differences and saliency maps evaluation , 2018, 2014 International Conference on Computer Vision Theory and Applications (VISAPP).

[43]  Zhou Wang,et al.  Statistics of natural image sequences: temporal motion smoothness by local phase correlations , 2009, Electronic Imaging.

[44]  Sanjit K. Mitra,et al.  No-reference video quality metric based on artifact measurements , 2005, IEEE International Conference on Image Processing 2005.

[45]  Alan C. Bovik,et al.  Motion Tuned Spatio-Temporal Quality Assessment of Natural Videos , 2010, IEEE Transactions on Image Processing.

[46]  Lai-Man Po,et al.  No-Reference Video Quality Assessment With 3D Shearlet Transform and Convolutional Neural Networks , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[47]  Sumohana S. Channappayya,et al.  An optical flow-based no-reference video quality assessment algorithm , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[48]  Nicolai Petkov,et al.  Computational models of visual neurons specialised in the detection of periodic and aperiodic oriented visual stimuli: bar and grating cells , 1997, Biological Cybernetics.

[49]  Michael J. Black,et al.  A framework for the robust estimation of optical flow , 1993, 1993 (4th) International Conference on Computer Vision.

[50]  Gunnar Farnebäck,et al.  Very High Accuracy Velocity Estimation using Orientation Tensors Parametric Motion and Simultaneous Segmentation of the Motion Field , 2001, ICCV.

[51]  Alan C. Bovik,et al.  A Completely Blind Video Integrity Oracle , 2016, IEEE Transactions on Image Processing.

[52]  Jorge E. Caviedes,et al.  No-reference quality metric for degraded and enhanced video , 2003, Visual Communications and Image Processing.

[53]  Jinwoo Kim,et al.  Deep Video Quality Assessor: From Spatio-Temporal Visual Sensitivity to a Convolutional Neural Aggregation Network , 2018, ECCV.

[54]  David S. Doermann,et al.  No-reference video quality assessment via feature learning , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[55]  Mikko Nuutinen,et al.  CVD2014—A Database for Evaluating No-Reference Video Quality Assessment Algorithms , 2016, IEEE Transactions on Image Processing.

[56]  Christophe Charrier,et al.  Blind Image Quality Assessment: A Natural Scene Statistics Approach in the DCT Domain , 2012, IEEE Transactions on Image Processing.

[57]  Mohammed Ghanbari,et al.  Reduced-Reference Video Quality Assessment Using Discriminative Local Harmonic Strength With Motion Consideration , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[58]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[59]  E H Adelson,et al.  Spatiotemporal energy models for the perception of motion. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[60]  Dietmar Saupe,et al.  The Konstanz natural video database (KoNViD-1k) , 2017, 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX).

[61]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[62]  D. Ruderman,et al.  Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[63]  Nicolai Petkov,et al.  Contour detection based on nonclassical receptive field inhibition , 2003, IEEE Trans. Image Process..