DS-Net: Dynamic Spatiotemporal Network for Video Salient Object Detection

As moving objects always draw more attention of human eyes, the temporal motive information is always exploited complementarily with spatial information to detect salient objects in videos. Although efficient tools such as optical flow have been proposed to extract temporal motive information, it often encounters difficulties when used for saliency detection due to the movement of camera or the partial movement of salient objects. In this paper, we investigate the complimentary roles of spatial and temporal information and propose a novel dynamic spatiotemporal network (DS-Net) for more effective fusion of spatiotemporal information. We construct a symmetric two-bypass network to explicitly extract spatial and temporal features. A dynamic weight generator (DWG) is designed to automatically learn the reliability of corresponding saliency branch. And a top-down cross attentive aggregation (CAA) procedure is designed so as to facilitate dynamic complementary aggregation of spatiotemporal features. Finally, the features are modified by spatial attention with the guidance of coarse saliency map and then go through decoder part for final saliency map. Experimental results on five benchmarks VOS, DAVIS, FBMS, SegTrack-v2, and ViSal demonstrate that the proposed method achieves superior performance than state-of-the-art algorithms. The source code is available at https://github.com/TJUMMG/DS-Net.

[1]  Yang Wang,et al.  Cross-Modal Self-Attention Network for Referring Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Xia Li,et al.  Weakly Supervised Salient Object Detection With Spatiotemporal Cascade Neural Networks , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Jitendra Malik,et al.  Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[4]  Tie Liu,et al.  DeepVS: A Deep Learning Based Video Saliency Prediction Approach , 2018, ECCV.

[5]  Han Wang,et al.  Salient Object Detection With Spatiotemporal Background Priors for Video , 2017, IEEE Transactions on Image Processing.

[6]  Li Zhen,et al.  Semi-Supervised Video Salient Object Detection Using Pseudo-Labels , 2019 .

[7]  Ming-Ming Cheng,et al.  EGNet: Edge Guidance Network for Salient Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Sabine Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Ling Shao,et al.  Video Salient Object Detection via Fully Convolutional Networks , 2017, IEEE Transactions on Image Processing.

[10]  Tao Li,et al.  Structure-Measure: A New Way to Evaluate Foreground Maps , 2017, International Journal of Computer Vision.

[11]  Feiping Nie,et al.  Saliency Detection via a Multiple Self-Weighted Graph-Based Manifold Ranking , 2020, IEEE Transactions on Multimedia.

[12]  Yizhou Yu,et al.  Motion Guided Attention for Video Salient Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Shao-Yi Chien,et al.  Real-Time Salient Object Detection with a Minimum Spanning Tree , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Olivier Le Meur,et al.  CNN-based temporal detection of motion saliency in videos , 2019, Pattern Recognit. Lett..

[15]  Zhihui Li,et al.  Visual saliency guided complex image retrieval , 2020, Pattern Recognit. Lett..

[16]  Yu Hen Hu,et al.  Video Saliency Detection via Graph Clustering With Motion Energy and Spatiotemporal Objectness , 2019, IEEE Transactions on Multimedia.

[17]  Wenhui Li,et al.  Saliency guided deep network for weakly-supervised image segmentation , 2018, Pattern Recognit. Lett..

[18]  Stanley T. Birchfield,et al.  Adaptive fragments-based tracking of non-rigid objects using level sets , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Lijuan Wang,et al.  Pyramid Constrained Self-Attention Network for Fast Video Salient Object Detection , 2020, AAAI.

[20]  Huchuan Lu,et al.  Learning to Detect Salient Objects with Image-Level Supervision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Sanyuan Zhao,et al.  Pyramid Dilated Deeper ConvLSTM for Video Salient Object Detection , 2018, ECCV.

[24]  Zhuowen Tu,et al.  Deeply Supervised Salient Object Detection with Short Connections , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Laurent Itti,et al.  Automatic foveation for video compression using a neurobiological model of visual attention , 2004, IEEE Transactions on Image Processing.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Radomír Mech,et al.  Minimum Barrier Salient Object Detection at 80 FPS , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Ling Shao,et al.  Video Saliency Detection Using Object Proposals , 2018, IEEE Transactions on Cybernetics.

[29]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yongdong Zhang,et al.  Multi-Modality Cross Attention Network for Image and Sentence Matching , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Xiaochun Cao,et al.  Motion saliency detection using low-rank and sparse decomposition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[32]  Yunchao Wei,et al.  STC: A Simple to Complex Framework for Weakly-Supervised Semantic Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Chong Peng,et al.  Salient Object Detection via Multiple Instance Joint Re-Learning , 2020, IEEE Transactions on Multimedia.

[34]  Huibin Wang,et al.  Spatial-temporal multi-task learning for salient region detection , 2020, Pattern Recognit. Lett..

[35]  Huchuan Lu,et al.  Amulet: Aggregating Multi-level Convolutional Features for Salient Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Xia Li,et al.  SCOM: Spatiotemporal Constrained Optimization for Salient Object Detection , 2018, IEEE Transactions on Image Processing.

[37]  Yuan Xie,et al.  Flow Guided Recurrent Neural Encoder for Video Salient Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Hyemin Lee,et al.  Salient Region-Based Online Object Tracking , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[39]  Hong Qin,et al.  Bilevel Feature Learning for Video Saliency Detection , 2018, IEEE Transactions on Multimedia.

[40]  James M. Rehg,et al.  Video Segmentation by Tracking Many Figure-Ground Segments , 2013, 2013 IEEE International Conference on Computer Vision.

[41]  Wenbin Zou,et al.  STA3D: Spatiotemporally attentive 3D network for video saliency prediction , 2021, Pattern Recognit. Lett..

[42]  Yuan Xie,et al.  Semi-Supervised Video Salient Object Detection Using Pseudo-Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[44]  Shuai Li,et al.  Accurate and Robust Video Saliency Detection via Self-Paced Diffusion , 2020, IEEE Transactions on Multimedia.

[45]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[46]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Ling Shao,et al.  Consistent Video Saliency Using Local Gradient Flow Optimization and Global Refinement , 2015, IEEE Transactions on Image Processing.

[48]  Yuan Xie,et al.  Instance-Level Salient Object Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Wenguan Wang,et al.  Shifting More Attention to Video Salient Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[51]  Xiaogang Wang,et al.  Unsupervised Salience Learning for Person Re-identification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.