Stereoscopic video quality assessment based on 3D convolutional neural networks

Abstract The research of stereoscopic video quality assessment (SVQA) plays an important role for promoting the development of stereoscopic video system. Existing SVQA metrics rely on hand-crafted features, which is inaccurate and time-consuming because of the diversity and complexity of stereoscopic video distortion. This paper introduces a 3D convolutional neural networks (CNN) based SVQA framework that can model not only local spatio-temporal information but also global temporal information with cubic difference video patches as input. First, instead of using hand-crafted features, we design a 3D CNN architecture to automatically and effectively capture local spatio-temporal features. Then we employ a quality score fusion strategy considering global temporal clues to obtain final video-level predicted score. Extensive experiments conducted on two public stereoscopic video quality datasets show that the proposed method correlates highly with human perception and outperforms state-of-the-art methods by a large margin. We also show that our 3D CNN features have more desirable property for SVQA than hand-crafted features in previous methods, and our 3D CNN features together with support vector regression (SVR) can further boost the performance. In addition, with no complex preprocessing and GPU acceleration, our proposed method is demonstrated computationally efficient and easy to use.

[1]  Jean-Yves Guillemaut,et al.  Stereoscopic Video Quality Assessment Using Binocular Energy , 2017, IEEE Journal of Selected Topics in Signal Processing.

[2]  Ahmet M. Kondoz,et al.  A new reduced reference metric for color plus depth 3D video , 2014, J. Vis. Commun. Image Represent..

[3]  Xiangyang Ji,et al.  Quality assessment of 3D asymmetric view coding using spatial frequency dominance model , 2009, 2009 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video.

[4]  Azeddine Beghdadi,et al.  A novel free reference image quality metric using neural network approach , 2010 .

[5]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Lin Ma,et al.  Learning structure of stereoscopic image for no-reference quality assessment with convolutional neural network , 2016, Pattern Recognit..

[7]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[8]  Jiachen Yang,et al.  Quality Index for Stereoscopic Images by Separately Evaluating Adding and Subtracting , 2015, PloS one.

[9]  Ahmet M. Kondoz,et al.  Prediction of stereoscopic video quality using objective quality models of 2-D video , 2008 .

[10]  Baihua Li,et al.  A no-reference optical flow-based quality evaluator for stereoscopic videos in curvelet domain , 2017, Inf. Sci..

[11]  Karen O. Egiazarian,et al.  3D-DCT based perceptual quality assessment of stereo video , 2011, 2011 18th IEEE International Conference on Image Processing.

[12]  Hao Chen,et al.  Automatic Detection of Cerebral Microbleeds From MR Images via 3D Convolutional Neural Networks , 2016, IEEE Transactions on Medical Imaging.

[13]  Alan C. Bovik,et al.  Blind image quality assessment on real distorted images using deep belief nets , 2014, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[14]  Siwei Ma,et al.  Stereoscopic video quality assessment model based on spatial-temporal structural information , 2012, 2012 Visual Communications and Image Processing.

[15]  PerronninFlorent,et al.  Good Practice in Large-Scale Learning for Image Classification , 2014 .

[16]  Weisi Lin,et al.  No-Reference Image Blur Assessment Based on Discrete Orthogonal Moments , 2016, IEEE Transactions on Cybernetics.

[17]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[19]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[20]  Zhibo Chen,et al.  Blind Stereoscopic Video Quality Assessment: From Depth Perception to Overall Experience , 2018, IEEE Transactions on Image Processing.

[21]  Xiaojun Wu,et al.  Blind Image Quality Assessment Using a General Regression Neural Network , 2011, IEEE Transactions on Neural Networks.

[22]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Baihua Li,et al.  Quality assessment metric of stereo images considering cyclopean integration and visual saliency , 2016, Inf. Sci..

[24]  Jonathan Tompson,et al.  MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation , 2014, ACCV.

[25]  Mei Yu,et al.  No reference stereo video quality assessment based on motion feature in tensor decomposition domain , 2018, J. Vis. Commun. Image Represent..

[26]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Lai-Man Po,et al.  No-Reference Video Quality Assessment With 3D Shearlet Transform and Convolutional Neural Networks , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[29]  Narciso García,et al.  NAMA3DS1-COSPAD1: Subjective video quality assessment database on coding conditions introducing freely available high quality 3D stereoscopic sequences , 2012, 2012 Fourth International Workshop on Quality of Multimedia Experience.

[30]  Qionghai Dai,et al.  Toward a Blind Deep Quality Evaluator for Stereoscopic Images Based on Monocular and Binocular Interactions , 2016, IEEE Transactions on Image Processing.

[31]  Mei Yu,et al.  A Stereo Video Quality Assessment Method for Compression Distortion , 2015, 2015 International Conference on Computational Science and Computational Intelligence (CSCI).

[32]  Xi Wang,et al.  Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification , 2015, ACM Multimedia.

[33]  Margaret H. Pinson,et al.  A new standardized method for objectively measuring video quality , 2004, IEEE Transactions on Broadcasting.

[34]  Xuelong Li,et al.  Blind Image Quality Assessment via Deep Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Weisi Lin,et al.  No-reference quality assessment of deblocked images , 2016, Neurocomputing.

[38]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[39]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Mei Yu,et al.  Binocular perception based reduced-reference stereo video quality assessment method , 2016, J. Vis. Commun. Image Represent..

[41]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[42]  Richard P. Wildes,et al.  Spatiotemporal Residual Networks for Video Action Recognition , 2016, NIPS.

[43]  Ashish Kapoor,et al.  Blind Image Quality Assessment Using Semi-supervised Rectifier Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[45]  Ahmet M. Kondoz,et al.  Quality analysis for 3D video using 2D video quality models , 2008, IEEE Transactions on Consumer Electronics.

[46]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[47]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[48]  Weisi Lin,et al.  Learning Structural Regularity for Evaluating Blocking Artifacts in JPEG Images , 2014, IEEE Signal Processing Letters.

[49]  Ahmet M. Kondoz,et al.  Perceptual Video Quality Metric for 3D video quality assessment , 2010, 2010 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video.

[50]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[51]  Feng Qi,et al.  Stereoscopic video quality assessment based on visual attention and just-noticeable difference models , 2015, Signal, Image and Video Processing.

[52]  Yi Li,et al.  Convolutional Neural Networks for No-Reference Image Quality Assessment , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Jiachen Yang,et al.  A perceptual stereoscopic image quality assessment model accounting for binocular combination behavior , 2015, J. Vis. Commun. Image Represent..

[54]  Ashish Kapoor,et al.  Learning a blind measure of perceptual image quality , 2011, CVPR 2011.

[55]  King Ngi Ngan,et al.  Reorganized DCT-based image representation for reduced reference stereoscopic image quality assessment , 2016, Neurocomputing.