Three-Stream Attention-Aware Network for RGB-D Salient Object Detection

Previous RGB-D fusion systems based on convolutional neural networks typically employ a two-stream architecture, in which RGB and depth inputs are learned independently. The multi-modal fusion stage is typically performed by concatenating the deep features from each stream in the inference process. The traditional two-stream architecture might experience insufficient multi-modal fusion due to two following limitations: 1) the cross-modal complementarity is rarely studied in the bottom–up path, wherein we believe the cross-modal complements can be combined to learn new discriminative features to enlarge the RGB-D representation community and 2) the cross-modal channels are typically combined by undifferentiated concatenation, which appears ambiguous to selecting cross-modal complementary features. In this paper, we address these two limitations by proposing a novel three-stream attention-aware multi-modal fusion network. In the proposed architecture, a cross-modal distillation stream, accompanying the RGB-specific and depth-specific streams, is introduced to extract new RGB-D features in each level in the bottom–up path. Furthermore, the channel-wise attention mechanism is innovatively introduced to the cross-modal cross-level fusion problem to adaptively select complementary feature maps from each modality in each level. Extensive experiments report the effectiveness of the proposed architecture and the significant improvement over the state-of-the-art RGB-D salient object detection methods.

[1]  Wenbin Zou,et al.  Saliency Tree: A Novel Saliency Detection Framework , 2014, IEEE Transactions on Image Processing.

[2]  Xueqing Li,et al.  Leveraging stereopsis for saliency analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Stephen Lin,et al.  Object-based RGBD image co-segmentation with mutex constraint , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Xin Zhao,et al.  Locality-Sensitive Deconvolution Networks with Gated Fusion for RGB-D Indoor Semantic Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yang Wang,et al.  Salient Object Segmentation via Effective Integration of Saliency and Objectness , 2017, IEEE Transactions on Multimedia.

[6]  Shijian Lu,et al.  Discriminative Multi-modal Feature Fusion for RGBD Indoor Scene Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jiangjiang Liu,et al.  Salient Objects in Clutter: Bringing Salient Object Detection to the Foreground , 2018, ECCV.

[8]  Junwei Han,et al.  DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  K. Madhava Krishna,et al.  Depth really Matters: Improving Visual Salient Region Detection with Depth , 2013, BMVC.

[10]  Tat-Seng Chua,et al.  SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  James M. Rehg,et al.  An In Depth View of Saliency , 2013, BMVC.

[12]  Zhuowen Tu,et al.  Deeply Supervised Salient Object Detection with Short Connections , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Xiaochun Cao,et al.  Depth Enhanced Saliency Detection Method , 2014, ICIMCS '14.

[14]  Changsheng Xu,et al.  A generic virtual content insertion system based on visual attention analysis , 2008, ACM Multimedia.

[15]  Jiwen Lu,et al.  MMSS: Multi-modal Sharable and Specific Feature Learning for RGB-D Object Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Seungyong Lee,et al.  RDFNet: RGB-D Multi-level Residual Feature Fusion for Indoor Semantic Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Mengke Huang,et al.  RGBD Co-saliency Detection via Bagging-Based Clustering , 2016, IEEE Signal Processing Letters.

[19]  Harish Katti,et al.  Depth Matters: Influence of Depth Cues on Visual Saliency , 2012, ECCV.

[20]  Michael Ying Yang,et al.  Exploiting global priors for RGB-D saliency detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21]  Xiaochun Cao,et al.  Audio Visual Attribute Discovery for Fine-Grained Object Recognition , 2018, AAAI.

[22]  Ran Shi,et al.  Salient object segmentation based on depth-aware image layering , 2018, Multimedia Tools and Applications.

[23]  Ran Ju,et al.  Depth saliency based on anisotropic center-surround difference , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[24]  Tao Li,et al.  Structure-Measure: A New Way to Evaluate Foreground Maps , 2017, International Journal of Computer Vision.

[25]  Dan Su,et al.  Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection , 2019, Pattern Recognit..

[26]  Youfu Li,et al.  Progressively Complementarity-Aware Fusion Network for RGB-D Salient Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Tongwei Ren,et al.  Salient object detection for RGB-D image via saliency evolution , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[28]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[29]  Nick Barnes,et al.  Local Background Enclosure for RGB-D Salient Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Qingming Huang,et al.  Saliency Detection for Stereoscopic Images Based on Depth Confidence Analysis and Multiple Cues Fusion , 2016, IEEE Signal Processing Letters.

[32]  Rongrong Ji,et al.  RGBD Salient Object Detection: A Benchmark and Algorithms , 2014, ECCV.

[33]  Qingming Huang,et al.  An Iterative Co-Saliency Framework for RGBD Images , 2017, IEEE Transactions on Cybernetics.

[34]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[35]  Runmin Cong,et al.  Co-Saliency Detection for RGBD Images Based on Multi-Constraint Feature Matching and Cross Label Propagation. , 2018, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[36]  Xiaofei Zhou,et al.  Saliency integration driven by similar images , 2018, J. Vis. Commun. Image Represent..

[37]  Junwei Han,et al.  CNNs-Based RGB-D Saliency Detection via Cross-View Transfer and Multiview Fusion. , 2018, IEEE transactions on cybernetics.

[38]  Ronggang Wang,et al.  An Innovative Salient Object Detection Using Center-Dark Channel Prior , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[39]  Daniel Cohen-Or,et al.  Cascaded Feature Network for Semantic Segmentation of RGB-D Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Weisi Lin,et al.  Saliency detection for stereoscopic images , 2013, 2013 Visual Communications and Image Processing (VCIP).

[41]  Ali Borji,et al.  Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.

[42]  Dan Su,et al.  Attention-Aware Cross-Modal Cross-Level Fusion Network for RGB-D Salient Object Detection , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[43]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[44]  Zhi Liu,et al.  Salient region detection for stereoscopic images , 2014, 2014 19th International Conference on Digital Signal Processing.

[45]  Huan Du,et al.  Depth-Aware Salient Object Detection and Segmentation via Multiscale Discriminative Saliency Fusion and Bootstrap Learning , 2017, IEEE Transactions on Image Processing.

[46]  Ge Li,et al.  A multilayer backpropagation saliency detection algorithm and its applications , 2018, Multimedia Tools and Applications.

[47]  Stephen Lin,et al.  Object-Based Multiple Foreground Segmentation in RGBD Video , 2017, IEEE Transactions on Image Processing.

[48]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[49]  Jiandong Tian,et al.  RGBD Salient Object Detection via Deep Fusion , 2016, IEEE Transactions on Image Processing.

[50]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.