论文信息 - Progressively Complementarity-Aware Fusion Network for RGB-D Salient Object Detection

Progressively Complementarity-Aware Fusion Network for RGB-D Salient Object Detection

How to incorporate cross-modal complementarity sufficiently is the cornerstone question for RGB-D salient object detection. Previous works mainly address this issue by simply concatenating multi-modal features or combining unimodal predictions. In this paper, we answer this question from two perspectives: (1) We argue that if the complementary part can be modelled more explicitly, the cross-modal complement is likely to be better captured. To this end, we design a novel complementarity-aware fusion (CA-Fuse) module when adopting the Convolutional Neural Network (CNN). By introducing cross-modal residual functions and complementarity-aware supervisions in each CA-Fuse module, the problem of learning complementary information from the paired modality is explicitly posed as asymptotically approximating the residual function. (2) Exploring the complement across all the levels. By cascading the CA-Fuse module and adding level-wise supervision from deep to shallow densely, the cross-level complement can be selected and combined progressively. The proposed RGB-D fusion network disambiguates both cross-modal and cross-level fusion processes and enables more sufficient fusion results. The experiments on public datasets show the effectiveness of the proposed CA-Fuse module and the RGB-D salient object detection network.

Youfu Li | Hao Chen | Youfu Li | Hao Chen

[1] Jianxiong Xiao,et al. Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Tianming Liu,et al. Learning to Predict Eye Fixations via Multiresolution Convolutional Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[3] Huan Du,et al. Depth-Aware Salient Object Detection and Segmentation via Multiscale Discriminative Saliency Fusion and Bootstrap Learning , 2017, IEEE Transactions on Image Processing.

[4] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[5] Zhuowen Tu,et al. Deeply-Supervised Nets , 2014, AISTATS.

[6] Michael Ying Yang,et al. Exploiting global priors for RGB-D saliency detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[7] Jiwen Lu,et al. MMSS: Multi-modal Sharable and Specific Feature Learning for RGB-D Object Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8] Ling Shao,et al. Specific object retrieval based on salient regions , 2006, Pattern Recognit..

[9] Wolfram Burgard,et al. Multimodal deep learning for robust RGB-D object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10] James M. Rehg,et al. An In Depth View of Saliency , 2013, BMVC.

[11] Zhuowen Tu,et al. Deeply Supervised Salient Object Detection with Short Connections , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Junwei Han,et al. DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] K. Madhava Krishna,et al. Depth really Matters: Improving Visual Salient Region Detection with Depth , 2013, BMVC.

[14] S. Süsstrunk,et al. Frequency-tuned salient region detection , 2009, CVPR 2009.

[15] Rongrong Ji,et al. RGBD Salient Object Detection: A Benchmark and Algorithms , 2014, ECCV.

[16] Jitendra Malik,et al. Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and Semantic Segmentation , 2015, International Journal of Computer Vision.

[17] Qingming Huang,et al. Saliency Detection for Stereoscopic Images Based on Depth Confidence Analysis and Multiple Cues Fusion , 2016, IEEE Signal Processing Letters.

[18] Ran Ju,et al. Depth saliency based on anisotropic center-surround difference , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[19] Xueqing Li,et al. Leveraging stereopsis for saliency analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Xin Zhao,et al. Locality-Sensitive Deconvolution Networks with Gated Fusion for RGB-D Indoor Semantic Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Xiaogang Wang,et al. Saliency detection by multi-context deep learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Xiaochun Cao,et al. Depth Enhanced Saliency Detection Method , 2014, ICIMCS '14.

[24] Ryan M. Eustice,et al. Real-Time Visual SLAM for Autonomous Underwater Hull Inspection Using Visual Saliency , 2013, IEEE Transactions on Robotics.

[25] Seungyong Lee,et al. RDFNet: RGB-D Multi-level Residual Feature Fusion for Indoor Semantic Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[28] Feng Wu,et al. Background Prior-Based Salient Object Detection via Deep Reconstruction Residual , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[29] Christof Koch,et al. A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[30] Runmin Cong,et al. Co-Saliency Detection for RGBD Images Based on Multi-Constraint Feature Matching and Cross Label Propagation. , 2018, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[31] Shuicheng Yan,et al. Dual Path Networks , 2017, NIPS.

[32] Ming-Hsuan Yang,et al. Top-down visual saliency via joint CRF and dictionary learning , 2012, CVPR.

[33] Tim K Marks,et al. SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[34] Daniel Cohen-Or,et al. Cascaded Feature Network for Semantic Segmentation of RGB-D Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35] Jiandong Tian,et al. RGBD Salient Object Detection via Deep Fusion , 2016, IEEE Transactions on Image Processing.

[36] Tsuyoshi Murata,et al. {m , 1934, ACML.

[37] Dong Xu,et al. Advanced Deep-Learning Techniques for Salient and Category-Specific Object Detection: A Survey , 2018, IEEE Signal Processing Magazine.

[38] R. Gunasekaran,et al. Region-based Saliency Detection and Its Application in Object Recognition , 2015 .

[39] Pietro Perona,et al. Graph-Based Visual Saliency , 2006, NIPS.

[40] Nick Barnes,et al. Local Background Enclosure for RGB-D Salient Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Ling Shao,et al. Cosaliency Detection Based on Intrasaliency Prior Transfer and Deep Intersaliency Mining , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[42] Weisi Lin,et al. Saliency detection for stereoscopic images , 2013, 2013 Visual Communications and Image Processing (VCIP).

[43] Jitendra Malik,et al. Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[44] Zhi Liu,et al. Salient region detection for stereoscopic images , 2014, 2014 19th International Conference on Digital Signal Processing.

[45] Shijian Lu,et al. Discriminative Multi-modal Feature Fusion for RGBD Indoor Scene Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Shi-Min Hu,et al. Global contrast based salient region detection , 2011, CVPR 2011.

[47] Deyu Meng,et al. Co-Saliency Detection via a Self-Paced Multiple-Instance Learning Framework , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48] Junwei Han,et al. CNNs-Based RGB-D Saliency Detection via Cross-View Transfer and Multiview Fusion. , 2018, IEEE transactions on cybernetics.