Learning to Explore Intrinsic Saliency for Stereoscopic Video

The human visual system excels at biasing the stereoscopic visual signals by the attention mechanisms. Traditional methods relying on the low-level features and depth relevant information for stereoscopic video saliency prediction have fundamental limitations. For example, it is cumbersome to model the interactions between multiple visual cues including spatial, temporal, and depth information as a result of the sophistication. In this paper, we argue that the high-level features are crucial and resort to the deep learning framework to learn the saliency map of stereoscopic videos. Driven by spatio-temporal coherence from consecutive frames, the model first imitates the mechanism of saliency by taking advantage of the 3D convolutional neural network. Subsequently, the saliency originated from the intrinsic depth is derived based on the correlations between left and right views in a data-driven manner. Finally, a Convolutional Long Short-Term Memory (Conv-LSTM) based fusion network is developed to model the instantaneous interactions between spatio-temporal and depth attributes, such that the ultimate stereoscopic saliency maps over time are produced. Moreover, we establish a new large-scale stereoscopic video saliency dataset (SVS) including 175 stereoscopic video sequences and their fixation density annotations, aiming to comprehensively study the intrinsic attributes for stereoscopic video saliency detection. Extensive experiments show that our proposed model can achieve superior performance compared to the state-of-the-art methods on the newly built dataset for stereoscopic videos.

[1]  Dmitriy Vatolin,et al.  Semiautomatic visual-attention modeling and its application to video compression , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[2]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Weisi Lin,et al.  Saliency detection for stereoscopic images , 2013, 2013 Visual Communications and Image Processing (VCIP).

[4]  Alan C. Bovik,et al.  Saliency Prediction on Stereoscopic Videos , 2014, IEEE Transactions on Image Processing.

[5]  Matthias Bethge,et al.  Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet , 2014, ICLR.

[6]  Panos Nasiopoulos,et al.  A learning-based visual saliency prediction model for stereoscopic 3D video (LBVS-3D) , 2016, Multimedia Tools and Applications.

[7]  Wei Liu,et al.  Long-Term Human Motion Prediction by Modeling Motion Context and Enhancing Motion Dynamic , 2018, IJCAI.

[8]  Tie Liu,et al.  DeepVS: A Deep Learning Based Video Saliency Prediction Approach , 2018, ECCV.

[9]  John K. Tsotsos,et al.  An attentional framework for stereo vision , 2005, The 2nd Canadian Conference on Computer and Robot Vision (CRV'05).

[10]  Aidong Men,et al.  Video saliency detection incorporating temporal information in compressed domain , 2015, Signal Process. Image Commun..

[11]  Jing Li,et al.  Visual Attention Modeling for Stereoscopic Video: A Benchmark and Computational Model , 2017, IEEE Transactions on Image Processing.

[12]  Ling Shao,et al.  Video Salient Object Detection via Fully Convolutional Networks , 2017, IEEE Transactions on Image Processing.

[13]  Zhou Wang,et al.  Video saliency incorporating spatiotemporal cues and uncertainty weighting , 2013, ICME.

[14]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[15]  Xu Wang,et al.  Deep Learning Features Inspired Saliency Detection of 3D Images , 2016, PCM.

[16]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[17]  Markus Vincze,et al.  Learning What Matters: Combining Probabilistic Models of 2D and 3D Saliency Cues , 2011, ICVS.

[18]  Frédo Durand,et al.  What Do Different Evaluation Metrics Tell Us About Saliency Models? , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[20]  Michael Dorr,et al.  Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Christel Chamaret,et al.  Adaptive 3D rendering based on region-of-interest , 2010, Electronic Imaging.

[22]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Rita Cucchiara,et al.  Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model , 2016, IEEE Transactions on Image Processing.

[24]  Harish Katti,et al.  Depth Matters: Influence of Depth Cues on Visual Saliency , 2012, ECCV.

[25]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[26]  Junle Wang,et al.  An eye tracking database for stereoscopic video , 2014, 2014 Sixth International Workshop on Quality of Multimedia Experience (QoMEX).

[27]  Qi Zhao,et al.  SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Ali Douik,et al.  A spatiotemporal model for video saliency detection , 2016, 2016 International Image Processing, Applications and Systems (IPAS).

[29]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  R. Watt Scanning from coarse to fine spatial scales in the human visual system after the onset of a stimulus. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[31]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[32]  Xiaoyan Sun,et al.  Learning to Detect Video Saliency With HEVC Features , 2017, IEEE Transactions on Image Processing.

[33]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[34]  Ali Borji,et al.  Revisiting Video Saliency: A Large-Scale Benchmark and a New Model , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Yan Liu,et al.  Video Saliency Detection via Dynamic Consistent Spatio-Temporal Attention Modelling , 2013, AAAI.

[36]  Lihi Zelnik-Manor,et al.  Context-Aware Saliency Detection , 2012, IEEE Trans. Pattern Anal. Mach. Intell..