Flow driven attention network for video salient object detection

Salient object detection has been revolutionised by convolutional neural network (CNN) recently. However, it is hard to transfer the state-of-the-art still-image based saliency detectors to videos directly, owing to the neglect of temporal contexts between frames. In this study, the authors propose a flow-driven attention network (FDAN) to exploit motion information for video salient object detection. FDAN consists of an appearance feature extractor, a motion-guided attention module and a saliency map regression module. It extracts the appearance feature per frame, refines appearance feature with optical flow and infers the ultimate saliency map, respectively. Motion-guided attention module is the core of FDAN, which extracts motion information in the form of attention. This attention mechanism is a two-branch CNN, fusing optical flow and appearance features. In addition, a shortcut connection is applied to the attention multiplied feature map for noise suppression intensively. Experimental results show that the proposed method can achieve performance on par with the state-of-the-art method flow-guided recurrent neural encoder on challenging benchmarks of Densely Annotated Video Segmentation and Freiburg-Berkeley Motion Segmentation while being two times faster in detection.

[1]  Hefeng Wu,et al.  Weighted attentional blocks for probabilistic object tracking , 2013, The Visual Computer.

[2]  Peyman Milanfar,et al.  Static and space-time visual saliency detection by self-resemblance. , 2009, Journal of vision.

[3]  Trung-Nghia Le,et al.  Video Salient Object Detection Using Spatiotemporal Deep Features , 2017, IEEE Transactions on Image Processing.

[4]  Hong Qin,et al.  Video Saliency Detection via Spatial-Temporal Fusion and Low-Rank Coherency Diffusion , 2017, IEEE Transactions on Image Processing.

[5]  Ling Shao,et al.  Video Salient Object Detection via Fully Convolutional Networks , 2017, IEEE Transactions on Image Processing.

[6]  Liming Zhang,et al.  A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression , 2010, IEEE Transactions on Image Processing.

[7]  Jitendra Malik,et al.  Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Zhuowen Tu,et al.  Deeply Supervised Salient Object Detection with Short Connections , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ling Shao,et al.  Consistent Video Saliency Using Local Gradient Flow Optimization and Global Refinement , 2015, IEEE Transactions on Image Processing.

[10]  Laurent Itti,et al.  Automatic foveation for video compression using a neurobiological model of visual attention , 2004, IEEE Transactions on Image Processing.

[11]  Michael J. Black,et al.  A Quantitative Analysis of Current Practices in Optical Flow Estimation and the Principles Behind Them , 2013, International Journal of Computer Vision.