Deep Visual Saliency on Stereoscopic Images

Visual saliency on stereoscopic 3D (S3D) images has been shown to be heavily influenced by image quality. Hence, this dependency is an important factor in image quality prediction, image restoration and discomfort reduction, but it is still very difficult to predict such a nonlinear relation in images. In addition, most algorithms specialized in detecting visual saliency on pristine images may unsurprisingly fail when facing distorted images. In this paper, we investigate a deep learning scheme named Deep Visual Saliency (DeepVS) to achieve a more accurate and reliable saliency predictor even in the presence of distortions. Since visual saliency is influenced by low-level features (contrast, luminance, and depth information) from a psychophysical point of view, we propose seven low-level features derived from S3D image pairs and utilize them in the context of deep learning to detect visual attention adaptively to human perception. During analysis, it turns out that the low-level features play a role to extract distortion and saliency information. To construct saliency predictors, we weight and model the human visual saliency through two different network architectures, a regression and a fully convolutional neural networks. Our results from thorough experiments confirm that the predicted saliency maps are up to 70% correlated with human gaze patterns, which emphasize the need for the hand-crafted features as input to deep neural networks in S3D saliency detection.

[1]  Marios S. Pattichis,et al.  Foveated video compression with optimal rate control , 2001, IEEE Trans. Image Process..

[2]  Jinwoo Kim,et al.  Deep Video Quality Assessor: From Spatio-Temporal Visual Sensitivity to a Convolutional Neural Aggregation Network , 2018, ECCV.

[3]  Weisi Lin,et al.  Saliency detection for stereoscopic images , 2013, 2013 Visual Communications and Image Processing (VCIP).

[4]  Lei Zhang,et al.  Deep Convolutional Neural Models for Picture-Quality Prediction: Challenges and Solutions to Data-Driven Image Quality Assessment , 2017, IEEE Signal Processing Magazine.

[5]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Rita Cucchiara,et al.  A deep multi-level network for saliency prediction , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[8]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[9]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Chong Luo,et al.  Multiple Level Feature-Based Universal Blind Image Quality Assessment Model , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[12]  Harish Katti,et al.  Depth Matters: Influence of Depth Cues on Visual Saliency , 2012, ECCV.

[13]  Norimichi Tsumura,et al.  Eye Movement Analysis and its Application to Evaluation of Image Quality , 1997, Color Imaging Conference.

[14]  Christof Koch,et al.  Learning visual saliency by combining feature maps in a nonlinear manner using AdaBoost. , 2012, Journal of vision.

[15]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[16]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Alan C. Bovik,et al.  Fast algorithms for foveated video processing , 2003, IEEE Trans. Circuits Syst. Video Technol..

[18]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  Dirk Walther,et al.  Interactions of visual attention and object recognition : computational modeling, algorithms, and psychophysics. , 2006 .

[21]  Oliver Chiu-sing Choy,et al.  Deep sparse rectifier neural networks for speech denoising , 2016, 2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC).

[22]  Alan C. Bovik,et al.  Subjective evaluation of stereoscopic image quality , 2013, Signal Process. Image Commun..

[23]  Alan C. Bovik,et al.  Stereoscopic 3D Visual Discomfort Prediction: A Dynamic Accommodation and Vergence Interaction Model , 2016, IEEE Transactions on Image Processing.

[24]  ITU-T Rec. P.910 (04/2008) Subjective video quality assessment methods for multimedia applications , 2009 .

[25]  Michael J. Black,et al.  Secrets of optical flow estimation and their principles , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Rita Cucchiara,et al.  Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model , 2016, IEEE Transactions on Image Processing.

[27]  R. Venkatesh Babu,et al.  DeepFix: A Fully Convolutional Neural Network for Predicting Human Eye Fixations , 2015, IEEE Transactions on Image Processing.

[28]  John K. Tsotsos,et al.  An attentional framework for stereo vision , 2005, The 2nd Canadian Conference on Computer and Robot Vision (CRV'05).

[29]  Kitsuchart Pasupa,et al.  Learning to Predict Where People Look with Tensor-Based Multi-view Learning , 2015, ICONIP.

[30]  Kwanghyun Lee,et al.  3D Perception Based Quality Pooling: Stereopsis, Binocular Rivalry, and Binocular Suppression , 2015, IEEE Journal of Selected Topics in Signal Processing.

[31]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[32]  Matthias Bethge,et al.  DeepGaze II: Reading fixations from deep features trained on object recognition , 2016, ArXiv.

[33]  Junle Wang,et al.  Computational Model of Stereoscopic 3D Visual Saliency , 2013, IEEE Transactions on Image Processing.

[34]  Do-Kyoung Kwon,et al.  Full-reference quality assessment of stereopairs accounting for rivalry , 2013, Signal Process. Image Commun..

[35]  Jongyoo Kim,et al.  Deep CNN-Based Blind Image Quality Predictor , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[37]  Sanghoon Lee,et al.  Fully Deep Blind Image Quality Predictor , 2017, IEEE Journal of Selected Topics in Signal Processing.

[38]  Alan C. Bovik,et al.  Transfer Function Model of Physiological Mechanisms Underlying Temporal Visual Discomfort Experienced When Viewing Stereoscopic 3D Images , 2015, IEEE Transactions on Image Processing.

[39]  Taewan Kim,et al.  Perceptual Crosstalk Prediction on Autostereoscopic 3D Display , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[40]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[41]  Alan C. Bovik,et al.  Automatic prediction of saliency on JPEG distorted images , 2011, 2011 Third International Workshop on Quality of Multimedia Experience.

[42]  Wilson S. Geisler,et al.  Real-time foveated multiresolution system for low-bandwidth video communication , 1998, Electronic Imaging.

[43]  Marcus Barkowsky,et al.  The Importance of Visual Attention in Improving the 3D-TV Viewing Experience: Overview and New Perspectives , 2011, IEEE Transactions on Broadcasting.

[44]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Lihi Zelnik-Manor,et al.  Context-aware saliency detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[46]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[47]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[48]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[49]  Noel E. O'Connor,et al.  SalGAN: Visual Saliency Prediction with Generative Adversarial Networks , 2017, ArXiv.

[50]  Noel E. O'Connor,et al.  Shallow and Deep Convolutional Networks for Saliency Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  T. Jost,et al.  Contribution of depth to visual attention: comparison of a computer model and human behavior , 2004 .

[52]  Neil D. B. Bruce,et al.  A Deeper Look at Saliency: Feature Contrast, Semantics, and Beyond , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Sanghoon Lee,et al.  Transition of Visual Attention Assessment in Stereoscopic Images With Evaluation of Subjective Visual Quality and Discomfort , 2015, IEEE Transactions on Multimedia.

[54]  Alan C. Bovik,et al.  Saliency Prediction on Stereoscopic Videos , 2014, IEEE Transactions on Image Processing.

[55]  K. Madhava Krishna,et al.  Depth really Matters: Improving Visual Salient Region Detection with Depth , 2013, BMVC.