Deep3DSaliency: Deep Stereoscopic Video Saliency Detection Model by 3D Convolutional Networks

Stereoscopic saliency detection plays an important role in various stereoscopic video processing applications. However, conventional stereoscopic video saliency detection methods mainly use independent low-level features instead of extracting them automatically, and thus, they ignore the intrinsic relationship between the spatial and temporal information. In this paper, we propose a novel stereoscopic video saliency detection method based on 3D convolutional neural networks, namely, deep 3D video saliency (Deep3DSaliency). The proposed network consists of two sub-models: spatiotemporal saliency model (STSM) and stereoscopic saliency aware model (SSAM). STSM directly takes three consecutive video frames as the input to extract visual spatiotemporal features, while SSAM attempts to further infer the depth and semantic features from the left and right video frames by shared parameters from STSM. The visual spatiotemporal features from STSM and the depth and semantic features from SSAM are learned by an alternating optimization scheme. Finally, all these saliency-related features are combined together for the final stereoscopic saliency detection via 3D deconvolution. Experimental results show the superior performance of the proposed model over other existing ones in saliency estimation for 3D video sequences.

[1]  Qiaosong Wang,et al.  GraB: Visual Saliency via Novel Graph Model and Background Priors , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Pietro Perona,et al.  Is bottom-up attention useful for object recognition? , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[3]  Ling Shao,et al.  Consistent Video Saliency Using Local Gradient Flow Optimization and Global Refinement , 2015, IEEE Transactions on Image Processing.

[4]  Qi Zhao,et al.  SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Weisi Lin,et al.  Saliency detection for stereoscopic images , 2013, 2013 Visual Communications and Image Processing (VCIP).

[6]  Huchuan Lu,et al.  Saliency Detection via Graph-Based Manifold Ranking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Lihi Zelnik-Manor,et al.  What Makes a Patch Distinct? , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Stan Sclaroff,et al.  Exploiting Surroundedness for Saliency Detection: A Boolean Map Approach , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Weisi Lin,et al.  Saliency-based stereoscopic image retargeting , 2016, Inf. Sci..

[10]  Rita Cucchiara,et al.  A deep multi-level network for saliency prediction , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[11]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[12]  Jie Yang,et al.  Saliency Detection by Fully Learning a Continuous Conditional Random Field , 2017, IEEE Transactions on Multimedia.

[13]  Ivan V. Bajic,et al.  Saliency-Aware Video Compression , 2014, IEEE Transactions on Image Processing.

[14]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Alan C. Bovik,et al.  Saliency Prediction on Stereoscopic Videos , 2014, IEEE Transactions on Image Processing.

[16]  Moncef Gabbouj,et al.  Spatiotemporal Saliency Estimation by Spectral Foreground Detection , 2018, IEEE Transactions on Multimedia.

[17]  Huchuan Lu,et al.  Salient object detection via global and local cues , 2015, Pattern Recognit..

[18]  Han Wang,et al.  Salient Object Detection With Spatiotemporal Background Priors for Video , 2017, IEEE Transactions on Image Processing.

[19]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Huchuan Lu,et al.  Deep networks for saliency detection via local estimation and global search , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Namho Hur,et al.  Stereoscopic 3D visual attention model considering comfortable viewing , 2012 .

[22]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[23]  Shiguang Shan,et al.  Adaptive Partial Differential Equation Learning for Visual Saliency Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[25]  Pedro A. Amado Assunção,et al.  A method to compute saliency regions in 3D video based on fusion of feature maps , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[26]  Mingli Song,et al.  Manifold Ranking-Based Matrix Factorization for Saliency Detection , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[27]  Wen Gao,et al.  Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video , 2010, International Journal of Computer Vision.

[28]  Ali Borji,et al.  Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study , 2013, IEEE Transactions on Image Processing.

[29]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[30]  Yueting Zhuang,et al.  DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection , 2015, IEEE Transactions on Image Processing.

[31]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Liming Zhang,et al.  A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression , 2010, IEEE Transactions on Image Processing.

[34]  Chunhong Pan,et al.  Building extraction from multi-source remote sensing images via deep deconvolution neural networks , 2016, 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[35]  Linwei Ye,et al.  Saliency Detection for Unconstrained Videos Using Superpixel-Level Graph and Spatiotemporal Propagation , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[36]  Yizhou Yu,et al.  Deep Contrast Learning for Salient Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Jorma Laaksonen,et al.  Bottom-Up Fixation Prediction Using Unsupervised Hierarchical Models , 2016, ACCV Workshops.

[38]  Nuno Vasconcelos,et al.  Anomaly Detection and Localization in Crowded Scenes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Ken Chen,et al.  Stereoscopic Visual Attention Model for 3D Video , 2010, MMM.

[40]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[41]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[42]  Junle Wang,et al.  Computational Model of Stereoscopic 3D Visual Saliency , 2013, IEEE Transactions on Image Processing.

[43]  Tianming Liu,et al.  Learning to Predict Eye Fixations via Multiresolution Convolutional Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[44]  Yizhou Yu,et al.  Visual saliency based on multiscale deep features , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Weisi Lin,et al.  Saliency-Based Defect Detection in Industrial Images by Using Phase Spectrum , 2014, IEEE Transactions on Industrial Informatics.

[46]  Shuiwang Ji,et al.  Residual Deconvolutional Networks for Brain Electron Microscopy Image Segmentation , 2017, IEEE Transactions on Medical Imaging.

[47]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[48]  Jing Li,et al.  Visual Attention Modeling for Stereoscopic Video: A Benchmark and Computational Model , 2017, IEEE Transactions on Image Processing.

[49]  Ling Shao,et al.  Video Salient Object Detection via Fully Convolutional Networks , 2017, IEEE Transactions on Image Processing.

[50]  Zhou Wang,et al.  Video saliency incorporating spatiotemporal cues and uncertainty weighting , 2013, ICME.

[51]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[52]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[53]  Fatih Murat Porikli,et al.  Saliency-aware geodesic video object segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Xuelong Li,et al.  Spatiochromatic Context Modeling for Color Saliency Analysis , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[55]  Weisi Lin,et al.  A Universal Framework for Salient Object Detection , 2016, IEEE Transactions on Multimedia.

[56]  Shuyuan Yang,et al.  Salient Region Detection via Discriminative Dictionary Learning and Joint Bayesian Inference , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[57]  Mei Han,et al.  Category-Independent Object-Level Saliency Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[58]  Mohan S. Kankanhalli,et al.  Online object tracking based on CNN with spatial-temporal saliency guided sampling , 2017, Neurocomputing.

[59]  Xiaogang Wang,et al.  Unsupervised Salience Learning for Person Re-identification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Panos Nasiopoulos,et al.  A learning-based visual saliency prediction model for stereoscopic 3D video (LBVS-3D) , 2016, Multimedia Tools and Applications.

[61]  Xiaogang Wang,et al.  Saliency detection by multi-context deep learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Hong Qin,et al.  Video Saliency Detection via Spatial-Temporal Fusion and Low-Rank Coherency Diffusion , 2017, IEEE Transactions on Image Processing.