Deep Visual Discomfort Predictor for Stereoscopic 3D Images

Most prior approaches to the problem of stereoscopic 3D (S3D) visual discomfort prediction (VDP) have focused on the extraction of perceptually meaningful handcrafted features based on models of visual perception and of natural depth statistics. Toward advancing performance on this problem, we have developed a deep learning-based VDP model named deep visual discomfort predictor (DeepVDP). The DeepVDP uses a convolutional neural network (CNN) to learn features that are highly predictive of experienced visual discomfort. Since a large amount of reference data is needed to train a CNN, we develop a systematic way of dividing the S3D image into local regions defined as patches and model a patch-based CNN using two sequential training steps. Since it is very difficult to obtain human opinions on each patch, instead a proxy ground-truth label that is generated by an existing S3D visual discomfort prediction algorithm called 3D-VDP is assigned to each patch. These proxy ground-truth labels are used to conduct the first stage of training the CNN. In the second stage, the automatically learned local abstractions are aggregated into global features via a feature aggregation layer. The learned features are iteratively updated via supervised learning on subjective 3D discomfort scores, which serve as ground-truth labels on each S3D image. The patch-based CNN model that has been pretrained on proxy ground-truth labels is subsequently retrained on true global subjective scores. The global S3D visual discomfort scores predicted by the trained DeepVDP model achieve the state-of-the-art performance as compared with previous VDP algorithms.

[1]  Yong Man Ro,et al.  Predicting Visual Discomfort Using Object Size and Disparity Information in Stereoscopic Images , 2013, IEEE Transactions on Broadcasting.

[2]  Ichiro Fujita,et al.  Representation of stereoscopic depth based on relative disparity in macaque area V4. , 2007, Journal of neurophysiology.

[3]  Sumio Yano,et al.  Visual fatigue caused by stereoscopic images and the search for the requirement to prevent them: A review , 2012, Displays.

[4]  Jiri Matas,et al.  Systematic evaluation of convolution neural network advances on the Imagenet , 2017, Comput. Vis. Image Underst..

[5]  Dimitris Kanellopoulos,et al.  Data Preprocessing for Supervised Leaning , 2007 .

[6]  Gregory C DeAngelis,et al.  Disparity Channels in Early Vision , 2007, The Journal of Neuroscience.

[7]  Alan C. Bovik,et al.  No-Reference Image Quality Assessment in the Spatial Domain , 2012, IEEE Transactions on Image Processing.

[8]  Kwanghoon Sohn,et al.  Visual fatigue modeling and analysis for stereoscopic video , 2012 .

[9]  F. Okano,et al.  Repeated vergence adaptation causes the decline of visual functions in watching stereoscopic television , 2005, Journal of Display Technology.

[10]  Gregory C DeAngelis,et al.  Coding of horizontal disparity and velocity by MT neurons in the alert macaque. , 2003, Journal of neurophysiology.

[11]  Sugato Chakravarty,et al.  Methodology for the subjective assessment of the quality of television pictures , 1995 .

[12]  Kwanghoon Sohn,et al.  Visual Fatigue Prediction for Stereoscopic Image , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Sanghoon Lee,et al.  Deep Convolutional Neural Models for Picture Quality Prediction , 2017 .

[14]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[15]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[16]  Yong Man Ro,et al.  Predicting Visual Discomfort of Stereoscopic Images Using Human Attention Model , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Toshiaki Fujii,et al.  A semi-automatic multi-view depth estimation method , 2010, Visual Communications and Image Processing.

[18]  Sumio Yano,et al.  A study of visual fatigue and visual comfort for 3D HDTV/HDTV images , 2002 .

[19]  Alan C. Bovik,et al.  3D Visual Discomfort Prediction: Vergence, Foveation, and the Physiological Optics of Accommodation , 2014, IEEE Journal of Selected Topics in Signal Processing.

[20]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[21]  Sanghoon Lee,et al.  Blind Deep S3D Image Quality Evaluation via Local to Global Feature Aggregation , 2017, IEEE Transactions on Image Processing.

[22]  Alexandre Pouget,et al.  Probabilistic Interpretation of Population Codes , 1996, Neural Computation.

[23]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[24]  Do-Kyoung Kwon,et al.  Full-reference quality assessment of stereopairs accounting for rivalry , 2013, Signal Process. Image Commun..

[25]  Alan C. Bovik,et al.  No-Reference Quality Assessment of Natural Stereopairs , 2013, IEEE Transactions on Image Processing.

[26]  Alan C. Bovik,et al.  3D Visual Discomfort Predictor: Analysis of Disparity and Neural Activity Statistics , 2015, IEEE Transactions on Image Processing.

[27]  Alan C. Bovik,et al.  Video Quality Pooling Adaptive to Perceptual Distortion Severity , 2013, IEEE Transactions on Image Processing.

[28]  Terence D Sanger,et al.  Neural population codes , 2003, Current Opinion in Neurobiology.

[29]  Alan C. Bovik,et al.  Experiments in segmenting texton patterns using localized spatial filters , 1989, Pattern Recognit..

[30]  Alan C. Bovik,et al.  Making a “Completely Blind” Image Quality Analyzer , 2013, IEEE Signal Processing Letters.

[31]  Heeseok Oh,et al.  Visual Presence: Viewing Geometry Visual Information of UHD S3D Entertainment. , 2016, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[32]  Lei Zhang,et al.  Deep Convolutional Neural Models for Picture-Quality Prediction: Challenges and Solutions to Data-Driven Image Quality Assessment , 2017, IEEE Signal Processing Magazine.

[33]  Yang Liu,et al.  Dichotomy between luminance and disparity features at binocular fixations. , 2010, Journal of vision.

[34]  Alan C. Bovik,et al.  Stereoscopic 3D Visual Discomfort Prediction: A Dynamic Accommodation and Vergence Interaction Model , 2016, IEEE Transactions on Image Processing.

[35]  C. Dunnett A Multiple Comparison Procedure for Comparing Several Treatments with a Control , 1955 .

[36]  Sanghoon Lee,et al.  Transition of Visual Attention Assessment in Stereoscopic Images With Evaluation of Subjective Visual Quality and Discomfort , 2015, IEEE Transactions on Multimedia.

[37]  D. Ruderman The statistics of natural images , 1994 .

[38]  David M. Hoffman,et al.  The zone of comfort: Predicting visual discomfort with stereo displays. , 2011, Journal of vision.

[39]  Fumio Okano,et al.  Measurement of parallax distribution and its application to the analysis of visual comfort for stereoscopic HDTV , 2003, IS&T/SPIE Electronic Imaging.

[40]  G. DeAngelis,et al.  Cortical area MT and the perception of stereoscopic depth , 1998, Nature.

[41]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[43]  Rafael Monroy,et al.  A computational model for perception of stereoscopic window violations , 2015, 2015 Seventh International Workshop on Quality of Multimedia Experience (QoMEX).

[44]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[45]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[46]  Alan C. Bovik,et al.  3D Visual Activity Assessment Based on Natural Scene Statistics , 2014, IEEE Transactions on Image Processing.

[47]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[48]  Diego Gutierrez,et al.  A metric of visual comfort for stereoscopic motion , 2013, ACM Trans. Graph..

[49]  Christophe Charrier,et al.  Blind Image Quality Assessment: A Natural Scene Statistics Approach in the DCT Domain , 2012, IEEE Transactions on Image Processing.

[50]  David M. Hoffman,et al.  Vergence-accommodation conflicts hinder visual performance and cause visual fatigue. , 2008, Journal of vision.

[51]  Touradj Ebrahimi,et al.  Impact of Acquisition Distortion on the Quality of Stereoscopic Images , 2010 .

[52]  David Kane,et al.  The Limits of Human Stereopsis in Space and Time , 2014, The Journal of Neuroscience.

[53]  Aldo Maalouf,et al.  CYCLOP: A stereo color image quality assessment metric , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[54]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[55]  Alan C. Bovik,et al.  Visual Importance Pooling for Image Quality Assessment , 2009, IEEE Journal of Selected Topics in Signal Processing.

[56]  Yong Man Ro,et al.  Visual comfort improvement in stereoscopic 3D displays using perceptually plausible assessment metric of visual comfort , 2014, IEEE Transactions on Consumer Electronics.

[57]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[58]  Yi Li,et al.  Convolutional Neural Networks for No-Reference Image Quality Assessment , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Sanghoon Lee,et al.  Fully Deep Blind Image Quality Predictor , 2017, IEEE Journal of Selected Topics in Signal Processing.

[60]  Alan C. Bovik,et al.  Transfer Function Model of Physiological Mechanisms Underlying Temporal Visual Discomfort Experienced When Viewing Stereoscopic 3D Images , 2015, IEEE Transactions on Image Processing.

[61]  Wa Wijnand IJsselsteijn,et al.  Visual discomfort in stereoscopic displays: a review , 2007, Electronic Imaging.

[62]  Ivan V. Oseledets,et al.  Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.

[63]  F. Okano,et al.  PARALLAX DISTRIBUTION AND VISUAL COMFORT ON STEREOSCOPIC HDTV , 2002 .

[64]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[65]  Mtm Marc Lambooij,et al.  Visual Discomfort and Visual Fatigue of Stereoscopic Displays: A Review , 2009 .