ICNet: Information Conversion Network for RGB-D Based Salient Object Detection

RGB-D based salient object detection (SOD) methods leverage the depth map as a valuable complementary information for better SOD performance. Previous methods mainly resort to exploit the correlation between RGB image and depth map in three fusion domains: input images, extracted features, and output results. However, these fusion strategies cannot fully capture the complex correlation between the RGB image and depth map. Besides, these methods do not fully explore the cross-modal complementarity and the cross-level continuity of information, and treat information from different sources without discrimination. In this paper, to address these problems, we propose a novel Information Conversion Network (ICNet) for RGB-D based SOD by employing the siamese structure with encoder-decoder architecture. To fuse high-level RGB and depth features in an interactive and adaptive way, we propose a novel Information Conversion Module (ICM), which contains concatenation operations and correlation layers. Furthermore, we design a Cross-modal Depth-weighted Combination (CDC) block to discriminate the cross-modal features from different sources and to enhance RGB features with depth features at each level. Extensive experiments on five commonly tested datasets demonstrate the superiority of our ICNet over 15 state-of-the-art RGB-D based SOD methods, and validate the effectiveness of the proposed ICM and CDC block.

[1]  Weisi Lin,et al.  Saliency detection for stereoscopic images , 2013, 2013 Visual Communications and Image Processing (VCIP).

[2]  Zhi Liu,et al.  Salient region detection for stereoscopic images , 2014, 2014 19th International Conference on Digital Signal Processing.

[3]  Wenbin Zou,et al.  Saliency Tree: A Novel Saliency Detection Framework , 2014, IEEE Transactions on Image Processing.

[4]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[6]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[7]  Qingming Huang,et al.  Saliency Detection for Stereoscopic Images Based on Depth Confidence Analysis and Multiple Cues Fusion , 2016, IEEE Signal Processing Letters.

[8]  Yang Cao,et al.  Contrast Prior and Fluid Pyramid Integration for RGBD Salient Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jiandong Tian,et al.  RGBD Salient Object Detection via Deep Fusion , 2016, IEEE Transactions on Image Processing.

[10]  Ran Ju,et al.  Depth saliency based on anisotropic center-surround difference , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[11]  Huchuan Lu,et al.  Visual Tracking via Coarse and Fine Structural Local Sparse Appearance Models , 2016, IEEE Transactions on Image Processing.

[12]  Yang Wang,et al.  Cross-Modal Self-Attention Network for Referring Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Yun Liu,et al.  DNA: Deeply Supervised Nonlinear Aggregation for Salient Object Detection , 2019, IEEE Transactions on Cybernetics.

[14]  Michael Ying Yang,et al.  Exploiting global priors for RGB-D saliency detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Tao Li,et al.  Structure-Measure: A New Way to Evaluate Foreground Maps , 2017, International Journal of Computer Vision.

[16]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[17]  Huan Du,et al.  Depth-Aware Salient Object Detection and Segmentation via Multiscale Discriminative Saliency Fusion and Bootstrap Learning , 2017, IEEE Transactions on Image Processing.

[18]  Nick Barnes,et al.  Learning RGB-D Salient Object Detection Using Background Enclosure, Depth Contrast, and Top-Down Features , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[19]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[20]  Zheng Lin,et al.  Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Youfu Li,et al.  Progressively Complementarity-Aware Fusion Network for RGB-D Salient Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Lihi Zelnik-Manor,et al.  How to Evaluate Foreground Maps , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Xiaochun Cao,et al.  Depth Enhanced Saliency Detection Method , 2014, ICIMCS '14.

[24]  Zhi Liu,et al.  Constrained fixation point based segmentation via deep neural network , 2019, Neurocomputing.

[25]  Huchuan Lu,et al.  Dual Deep Network for Visual Tracking , 2016, IEEE Transactions on Image Processing.

[26]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[27]  Ronggang Wang,et al.  An Innovative Salient Object Detection Using Center-Dark Channel Prior , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[28]  Bo Ren,et al.  Enhanced-alignment Measure for Binary Foreground Map Evaluation , 2018, IJCAI.

[29]  Haibin Ling,et al.  Saliency Detection on Light Field , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Xueqing Li,et al.  Leveraging stereopsis for saliency analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Ming-Ming Cheng,et al.  EGNet: Edge Guidance Network for Salient Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Wei Zhang,et al.  Salient object detection for RGB-D image by single stream recurrent convolution neural network , 2019, Neurocomputing.

[34]  Jiangjiang Liu,et al.  Salient Objects in Clutter: Bringing Salient Object Detection to the Foreground , 2018, ECCV.

[35]  Youfu Li,et al.  Three-Stream Attention-Aware Network for RGB-D Salient Object Detection , 2019, IEEE Transactions on Image Processing.

[36]  Junwei Han,et al.  CNNs-Based RGB-D Saliency Detection via Cross-View Transfer and Multiview Fusion. , 2018, IEEE transactions on cybernetics.

[37]  Tongwei Ren,et al.  Salient object detection for RGB-D image via saliency evolution , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[38]  Nick Barnes,et al.  Local Background Enclosure for RGB-D Salient Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Zhi Liu,et al.  Effective online refinement for video object segmentation , 2019, Multimedia Tools and Applications.

[40]  Ling Shao,et al.  RANet: Ranking Attention Network for Fast Video Object Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Wenguan Wang,et al.  Shifting More Attention to Video Salient Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Xiaojin Gong,et al.  Adaptive Fusion for RGB-D Salient Object Detection , 2019, IEEE Access.

[43]  Rongrong Ji,et al.  RGBD Salient Object Detection: A Benchmark and Algorithms , 2014, ECCV.

[44]  Wei Liu,et al.  Improving Video Saliency Detection via Localized Estimation and Spatiotemporal Refinement , 2018, IEEE Transactions on Multimedia.

[45]  Xiangyang Wang,et al.  Depth-aware saliency detection using convolutional neural networks , 2019, J. Vis. Commun. Image Represent..

[46]  Carsten Rother,et al.  Deep Object Co-Segmentation , 2018, ACCV.

[47]  Wei Ji,et al.  Depth-Induced Multi-Scale Recurrent Attention Network for Saliency Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[48]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[49]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[50]  Dan Su,et al.  Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection , 2019, Pattern Recognit..