Full-Reference Video Quality Assessment Using Deep 3D Convolutional Neural Networks

We present a novel framework called Deep Video QUality Evaluator (DeepVQUE) for full-reference video quality assessment (FRVQA) using deep 3D convolutional neural networks (3D ConvNets). DeepVQUE is a complementary framework to traditional handcrafted feature based methods in that it uses deep 3D ConvNet models for feature extraction. 3D ConvNets are capable of extracting spatio-temporal features of the video which are vital for video quality assessment (VQA). Most of the existing FRVQA approaches operate on spatial and temporal domains independently followed by pooling, and often ignore the crucial spatio-temporal relationship of intensities in natural videos. In this work, we pay special attention to the contribution of spatio-temporal dependencies in natural videos to quality assessment. Specifically, the proposed approach estimates the spatio-temporal quality of a video with respect to its pristine version by applying commonly used distance measures such as the l1 or the l2 norm to the volume-wise pristine and distorted 3D ConvNet features. Spatial quality is estimated using off-the-shelf full-reference image quality assessment (FRIQA) methods. Overall video quality is estimated using support vector regression (SVR) applied to the spatio-temporal and spatial quality estimates. Additionally, we illustrate the ability of the proposed approach to localize distortions in space and time.

[1]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[2]  Christian Viard-Gaudin,et al.  A Convolutional Neural Network Approach for Objective Video Quality Assessment , 2006, IEEE Transactions on Neural Networks.

[3]  Rajiv Soundararajan,et al.  Study of Subjective and Objective Quality Assessment of Video , 2010, IEEE Transactions on Image Processing.

[4]  Gustavo de Veciana,et al.  Video Quality Assessment on Mobile Devices: Subjective, Behavioral and Objective Studies , 2012, IEEE Journal of Selected Topics in Signal Processing.

[5]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Yi Li,et al.  Convolutional Neural Networks for No-Reference Image Quality Assessment , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Paolo Napoletano,et al.  On the use of deep learning for blind image quality assessment , 2016, Signal Image Video Process..

[8]  Alan C. Bovik,et al.  Motion Tuned Spatio-Temporal Quality Assessment of Natural Videos , 2010, IEEE Transactions on Image Processing.

[9]  Margaret H. Pinson,et al.  A new standardized method for objectively measuring video quality , 2004, IEEE Transactions on Broadcasting.

[10]  Margaret H. Pinson,et al.  Temporal Video Quality Model Accounting for Variable Frame Delay Distortions , 2014, IEEE Transactions on Broadcasting.

[11]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Lin Ma,et al.  Learning structure of stereoscopic image for no-reference quality assessment with convolutional neural network , 2016, Pattern Recognit..

[13]  Sebastian Bosse,et al.  Full-Reference Image Quality Assessment Using Neural Networks , 2016 .

[14]  Dietmar Saupe,et al.  The Konstanz natural video database (KoNViD-1k) , 2017, 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX).

[15]  Stefano Tubaro,et al.  A H.264/AVC video database for the evaluation of quality metrics , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.