Visual Attention Prediction for Stereoscopic Video by Multi-Module Fully Convolutional Network

Visual attention is an important mechanism in the human visual system (HVS) and there have been numerous saliency detection algorithms designed for 2D images/video recently. However, the research for fixation detection of stereoscopic video is still limited and challenging due to the complicated depth and motion information. In this paper, we design a novel multi-module fully convolutional network (MM-FCN) for fixation detection of stereoscopic video. Specifically, we design a fully convolutional network for spatial saliency prediction (S-FCN), where the initial spatial saliency map of stereoscopic video is learned by image database of object detection. Furthermore, the fully convolutional network for temporal saliency prediction (T-FCN) is constructed by combining saliency results from S-FCN and motion information from video frames. Finally, the fully convolutional network for depth fixation prediction (D-FCN) is designed to compute the final fixation map of stereoscopic video by learning depth features with spatiotemporal features from T-FCN. The experimental results show that the proposed MM-FCN can predict fixation results for stereoscopic video more effectively and efficiently than other related fixation prediction methods.

[1]  Junwei Han,et al.  DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Heinz Hügli,et al.  Computing visual attention from scene depth , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[3]  Yizhou Yu,et al.  Deep Contrast Learning for Salient Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Junle Wang,et al.  Computational Model of Stereoscopic 3D Visual Saliency , 2013, IEEE Transactions on Image Processing.

[5]  Ken Chen,et al.  Stereoscopic Visual Attention Model for 3D Video , 2010, MMM.

[6]  Ali Borji,et al.  Exploiting local and global patch rarities for saliency detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Jian Sun,et al.  Saliency Optimization from Robust Background Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Deyu Meng,et al.  Co-Saliency Detection via a Self-Paced Multiple-Instance Learning Framework , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Jitendra Malik,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Segmentation of Moving Objects by Long Term Video Analysis , 2022 .

[10]  Alan C. Bovik,et al.  Saliency Prediction on Stereoscopic Videos , 2014, IEEE Transactions on Image Processing.

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Tianming Liu,et al.  Learning to Predict Eye Fixations via Multiresolution Convolutional Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[15]  Hong Qin,et al.  Video Saliency Detection via Spatial-Temporal Fusion and Low-Rank Coherency Diffusion , 2017, IEEE Transactions on Image Processing.

[16]  Wenguan Wang,et al.  Deep Visual Attention Prediction , 2017, IEEE Transactions on Image Processing.

[17]  Jitendra Malik,et al.  Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[18]  Huchuan Lu,et al.  Adaptive Metric Learning for Saliency Detection , 2015, IEEE Transactions on Image Processing.

[19]  Harish Katti,et al.  Depth Matters: Influence of Depth Cues on Visual Saliency , 2012, ECCV.

[20]  Junwei Han,et al.  CNNs-Based RGB-D Saliency Detection via Cross-View Transfer and Multiview Fusion. , 2018, IEEE transactions on cybernetics.

[21]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[22]  Huchuan Lu,et al.  CNN for saliency detection with low-level feature integration , 2017, Neurocomputing.

[23]  Jing Li,et al.  Visual Attention Modeling for Stereoscopic Video: A Benchmark and Computational Model , 2017, IEEE Transactions on Image Processing.

[24]  Yizhou Yu,et al.  Visual Saliency Detection Based on Multiscale Deep CNN Features , 2016, IEEE Transactions on Image Processing.

[25]  Ling Shao,et al.  Video Salient Object Detection via Fully Convolutional Networks , 2017, IEEE Transactions on Image Processing.

[26]  Zhou Wang,et al.  Video saliency incorporating spatiotemporal cues and uncertainty weighting , 2013, ICME.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Qingming Huang,et al.  Saliency Detection for Stereoscopic Images Based on Depth Confidence Analysis and Multiple Cues Fusion , 2016, IEEE Signal Processing Letters.

[29]  Victor Leboran,et al.  Dynamic Whitening Saliency , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31]  Aykut Erdem,et al.  Spatio-Temporal Saliency Networks for Dynamic Saliency Prediction , 2016, IEEE Transactions on Multimedia.

[32]  Linwei Ye,et al.  Saliency Detection for Unconstrained Videos Using Superpixel-Level Graph and Spatiotemporal Propagation , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[33]  Li Xu,et al.  Hierarchical Image Saliency Detection on Extended CSSD , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Yueting Zhuang,et al.  Saliency Detection within a Deep Convolutional Architecture , 2014, AAAI 2014.

[35]  Huchuan Lu,et al.  Saliency Detection with Recurrent Fully Convolutional Networks , 2016, ECCV.

[36]  Huchuan Lu,et al.  Saliency Detection via Absorbing Markov Chain , 2013, 2013 IEEE International Conference on Computer Vision.

[37]  Zhuowen Tu,et al.  Deeply Supervised Salient Object Detection with Short Connections , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Tiejun Huang,et al.  Visual Saliency with Statistical Priors , 2013, International Journal of Computer Vision.

[40]  Ling Shao,et al.  Consistent Video Saliency Using Local Gradient Flow Optimization and Global Refinement , 2015, IEEE Transactions on Image Processing.

[41]  Lihi Zelnik-Manor,et al.  What Makes a Patch Distinct? , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Tim J. Smith,et al.  Do low-level visual features have a causal influence on gaze during dynamic scene viewing? , 2013 .

[43]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[44]  Fatih Murat Porikli,et al.  Saliency-aware geodesic video object segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Huchuan Lu,et al.  Saliency Detection via Graph-Based Manifold Ranking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Xing Cai,et al.  PDNet: Prior-Model Guided Depth-Enhanced Network for Salient Object Detection , 2018, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[48]  Pedro A. Amado Assunção,et al.  A method to compute saliency regions in 3D video based on fusion of feature maps , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[49]  Noel E. O'Connor,et al.  Shallow and Deep Convolutional Networks for Saliency Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Xiaogang Wang,et al.  Saliency detection by multi-context deep learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Shi-Min Hu,et al.  SalientShape: group saliency in image collections , 2013, The Visual Computer.

[52]  Christel Chamaret,et al.  Adaptive 3D rendering based on region-of-interest , 2010, Electronic Imaging.

[53]  Yueting Zhuang,et al.  DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection , 2015, IEEE Transactions on Image Processing.

[54]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[55]  Weisi Lin,et al.  Saliency detection for stereoscopic images , 2013, 2013 Visual Communications and Image Processing (VCIP).

[56]  Youfu Li,et al.  Progressively Complementarity-Aware Fusion Network for RGB-D Salient Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[57]  Huchuan Lu,et al.  Deep networks for saliency detection via local estimation and global search , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[59]  Asha Iyer,et al.  Components of bottom-up gaze allocation in natural images , 2005, Vision Research.

[60]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[61]  Junle Wang,et al.  An eye tracking database for stereoscopic video , 2014, 2014 Sixth International Workshop on Quality of Multimedia Experience (QoMEX).

[62]  Yuting Zhang,et al.  Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.