论文信息 - Motion Guided Attention for Video Salient Object Detection

Motion Guided Attention for Video Salient Object Detection

Video salient object detection aims at discovering the most visually distinctive objects in a video. How to effectively take object motion into consideration during video salient object detection is a critical issue. Existing state-of-the-art methods either do not explicitly model and harvest motion cues or ignore spatial contexts within optical flow images. In this paper, we develop a multi-task motion guided video salient object detection network, which learns to accomplish two sub-tasks using two sub-networks, one sub-network for salient object detection in still images and the other for motion saliency detection in optical flow images. We further introduce a series of novel motion guided attention modules, which utilize the motion saliency sub-network to attend and enhance the sub-network for still images. These two sub-networks learn to adapt to each other by end-to-end training. Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on a wide range of benchmarks. We hope our simple and effective approach will serve as a solid baseline and help ease future research in video salient object detection. Code and models will be made available.

[1] Xiaogang Wang,et al. Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Christof Koch,et al. A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[3] Vladlen Koltun,et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[4] Guoqiang Han,et al. R³Net: Recurrent Residual Refinement Network for Saliency Detection , 2018, IJCAI.

[5] Martin Jägersand,et al. Video Segmentation using Teacher-Student Adaptation in a Human Robot Interaction (HRI) Setting , 2018, ArXiv.

[6] Qinghua Hu,et al. SG-FCN: A Motion and Memory-Based Deep Learning Model for Video Saliency Detection , 2018, IEEE Transactions on Cybernetics.

[7] Michael J. Black,et al. A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[8] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[9] Yuan Xie,et al. Instance-Level Salient Object Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Jitendra Malik,et al. Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[11] Ling Shao,et al. Video Salient Object Detection via Fully Convolutional Networks , 2017, IEEE Transactions on Image Processing.

[12] Huchuan Lu,et al. A Stagewise Refinement Model for Detecting Salient Objects in Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13] Wenguan Wang,et al. Shifting More Attention to Video Salient Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Sabine Süsstrunk,et al. Frequency-tuned salient region detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15] Tao Li,et al. Structure-Measure: A New Way to Evaluate Foreground Maps , 2017, International Journal of Computer Vision.

[16] Liang Lin,et al. Interpretable Video Captioning via Trajectory Structured Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17] Jan Kautz,et al. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18] Ben Wang,et al. Reverse Attention for Salient Object Detection , 2018, ECCV.

[19] Zhuowen Tu,et al. Deeply Supervised Salient Object Detection with Short Connections , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Zhiming Luo,et al. Non-local Deep Features for Salient Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Luc Van Gool,et al. A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[23] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Chang-Su Kim,et al. Primary Object Segmentation in Videos Based on Region Augmentation and Reduction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Karteek Alahari,et al. Learning Motion Patterns in Videos , 2016, CVPR.

[26] Tao Mei,et al. Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Huchuan Lu,et al. Learning to Detect Salient Objects with Image-Level Supervision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Xiaogang Wang,et al. Unsupervised Salience Learning for Person Re-identification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[29] Dinggang Shen,et al. Contour Knowledge Transfer for Salient Object Detection , 2018, ECCV.

[30] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[31] Alexander G. Schwing,et al. Unsupervised Video Object Segmentation using Motion Saliency-Guided Spatio-Temporal Propagation , 2018, ECCV.

[32] Martin Jägersand,et al. Video Object Segmentation using Teacher-Student Adaptation in a Human Robot Interaction (HRI) Setting , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[33] Kristen Grauman,et al. FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Hong Qin,et al. Video Saliency Detection via Spatial-Temporal Fusion and Low-Rank Coherency Diffusion , 2017, IEEE Transactions on Image Processing.

[35] Laurent Itti,et al. Automatic foveation for video compression using a neurobiological model of visual attention , 2004, IEEE Transactions on Image Processing.

[36] Thomas Brox,et al. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Karteek Alahari,et al. Learning Video Object Segmentation with Visual Memory , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38] Ling Shao,et al. Consistent Video Saliency Using Local Gradient Flow Optimization and Global Refinement , 2015, IEEE Transactions on Image Processing.

[39] Ming-Hsuan Yang,et al. PiCANet: Learning Pixel-Wise Contextual Attention for Saliency Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40] Huchuan Lu,et al. Amulet: Aggregating Multi-level Convolutional Features for Salient Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41] Yuan Xie,et al. Flow Guided Recurrent Neural Encoder for Video Salient Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42] Trung-Nghia Le,et al. Video Salient Object Detection Using Spatiotemporal Deep Features , 2017, IEEE Transactions on Image Processing.

[43] Linwei Ye,et al. Saliency Detection for Unconstrained Videos Using Superpixel-Level Graph and Spatiotemporal Propagation , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[44] Huchuan Lu,et al. Detect Globally, Refine Locally: A Novel Approach to Saliency Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45] Thomas Brox,et al. FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46] C.-C. Jay Kuo,et al. Unsupervised Video Object Segmentation with Motion-Based Bilateral Networks , 2018, ECCV.

[47] Sanyuan Zhao,et al. Pyramid Dilated Deeper ConvLSTM for Video Salient Object Detection , 2018, ECCV.

[48] Qin Huang,et al. Instance Embedding Transfer to Unsupervised Video Object Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49] Hefeng Wu,et al. Weighted attentional blocks for probabilistic object tracking , 2013, The Visual Computer.

[50] Huchuan Lu,et al. Learning Uncertain Convolutional Features for Accurate Saliency Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).