Object Segmentation by Mining Cross-Modal Semantics

Multi-sensor clues have shown promise for object segmentation, but inherent noise in each sensor, as well as the calibration error in practice, may bias the segmentation accuracy. In this paper, we propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features, with the aim of controlling the modal contribution based on relative entropy. We explore semantics among the multimodal inputs in two aspects: the modality-shared consistency and the modality-specific variation. Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision. On the one hand, the AF block explicitly dissociates the shared and specific representation and learns to weight the modal contribution by adjusting the proportion, region, and pattern, depending upon the quality. On the other hand, our CFD initially decodes the shared feature and then refines the output through specificity-aware querying. Further, we enforce semantic consistency across the decoding layers to enable interaction across network hierarchies, improving feature discriminability. Exhaustive comparison on eleven datasets with depth or thermal clues, and on two challenging tasks, namely salient and camouflage object segmentation, validate our effectiveness in terms of both performance and robustness.

[1]  Yo-Sung Ho,et al.  Modality-Induced Transfer-Fusion Network for RGB-D and RGB-T Salient Object Detection , 2023, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Xianyong Fang,et al.  Scribble-Supervised RGB-T Salient Object Detection , 2023, 2023 IEEE International Conference on Multimedia and Expo (ICME).

[3]  Rongwang Yang,et al.  LSNet: Lightweight Spatial Boosting Network for Detecting Salient Objects in RGB-Thermal Images , 2023, IEEE Transactions on Image Processing.

[4]  F. Mériaudeau,et al.  HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness , 2023, IEEE Transactions on Image Processing.

[5]  Jia Li,et al.  Rethinking Lightweight Salient Object Detection via Network Depth-Width Tradeoff , 2023, IEEE Transactions on Image Processing.

[6]  Yacheng Tan,et al.  HRTransNet: HRFormer-Driven Two-Modality Salient Object Detection , 2023, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Yiming Qian,et al.  Glass Segmentation With RGB-Thermal Image Pairs , 2022, IEEE Transactions on Image Processing.

[8]  Yo-Sung Ho,et al.  Cross-Modality Double Bidirectional Interaction and Fusion Network for RGB-T Salient Object Detection , 2023, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  L. Gool,et al.  Source-free Depth for Object Pop-out , 2022, ArXiv.

[10]  L. Gool,et al.  CamoFormer: Masked Separable Attention for Camouflaged Object Detection , 2022, ArXiv.

[11]  Huchuan Lu,et al.  PreyNet: Preying on Camouflaged Objects , 2022, ACM Multimedia.

[12]  Deng Cai,et al.  Domain Reconstruction and Resampling for Robust Salient Object Detection , 2022, ACM Multimedia.

[13]  A. Hao,et al.  Synthetic Data Supervised Salient Object Detection , 2022, ACM Multimedia.

[14]  Runmin Cong,et al.  Does Thermal Really Always Matter for RGB-T Salient Object Detection? , 2022, IEEE Transactions on Multimedia.

[15]  Runmin Cong,et al.  CIR-Net: Cross-Modality Interaction and Refinement for RGB-D Salient Object Detection , 2022, IEEE Transactions on Image Processing.

[16]  Kelvin Cheng,et al.  Reimagining the Stadium Spectator Experience using Augmented Reality and Visual Positioning System , 2022, 2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct).

[17]  Wenfeng Song,et al.  Improving RGB-D Salient Object Detection via Modality-Aware Decoder , 2022, IEEE Transactions on Image Processing.

[18]  Keren Fu,et al.  3-D Convolutional Neural Networks for RGB-D Salient Object Detection and Beyond. , 2022, IEEE transactions on neural networks and learning systems.

[19]  Q. Jiang,et al.  CGMDRNet: Cross-Guided Modality Difference Reduction Network for RGB-T Salient Object Detection , 2022, IEEE Transactions on Circuits and Systems for Video Technology.

[20]  Guillaume Allibert,et al.  Robust RGB-D Fusion for Saliency Detection , 2022, 2022 International Conference on 3D Vision (3DV).

[21]  Wujie Zhou,et al.  APNet: Adversarial Learning Assistance and Perceived Importance Fusion Network for All-Day RGB-T Salient Object Detection , 2022, IEEE Transactions on Emerging Topics in Computational Intelligence.

[22]  Sangyoun Lee,et al.  SPSN: Superpixel Prototype Sampling Network for RGB-D Salient Object Detection , 2022, ECCV.

[23]  Xin Fan,et al.  Segment, Magnify and Reiterate: Detecting Camouflaged Objects the Hard Way , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jin Tang,et al.  Weakly Alignment-Free RGBT Salient Object Detection With Deep Correlation Network , 2022, IEEE Transactions on Image Processing.

[25]  Yun Xiao,et al.  SwinNet: Swin Transformer Drives Edge-Aware RGB-D and RGB-T Salient Object Detection , 2022, IEEE Transactions on Circuits and Systems for Video Technology.

[26]  Chiew-Lan Tai,et al.  TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Huchuan Lu,et al.  Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection , 2022, 2203.02688.

[28]  Jeanine K. Stefanucci,et al.  Depth Perception in Augmented Reality: The Effects of Display, Shadow, and Position , 2022, 2022 IEEE Conference on Virtual Reality and 3D User Interfaces (VR).

[29]  G. Qin,et al.  Multi-modal interactive attention and dual progressive decoding network for RGB-D/T salient object detection , 2022, Neurocomputing.

[30]  Jinhui Tang,et al.  Learning Discriminative Cross-Modality Features for RGB-D Saliency Detection , 2022, IEEE Transactions on Image Processing.

[31]  Trevor Darrell,et al.  A ConvNet for the 2020s , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Fushuo Huo,et al.  Efficient Context-Guided Stacked Refinement Network for RGB-T Salient Object Detection , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[33]  Kechen Song,et al.  CGFNet: Cross-Guided Fusion Network for RGB-T Salient Object Detection , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[34]  P. Luo,et al.  PVT v2: Improved baselines with Pyramid Vision Transformer , 2021, Computational Visual Media.

[35]  Jenq-Neng Hwang,et al.  ECFFNet: Effective and Consistent Feature Fusion Network for RGB-T Salient Object Detection , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[36]  Chenglong Li,et al.  RGBT Salient Object Detection: A Large-scale Dataset and Benchmark , 2020, IEEE Transactions on Multimedia.

[37]  Konrad Schindler,et al.  Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Huchuan Lu,et al.  MVSalNet: Multi-view Augmentation for RGB-D Salient Object Detection , 2022, ECCV.

[39]  Guillaume Allibert,et al.  Transformer Fusion for Indoor Rgb-D Semantic Segmentation , 2022, SSRN Electronic Journal.

[40]  Wujie Zhou,et al.  Hierarchical Decoding Network Based on Swin Transformer for Detecting Salient Objects in RGB-T Images , 2022, IEEE Signal Processing Letters.

[41]  Chuanbo Chen,et al.  Depth-induced Gap-reducing Network for RGB-D Salient Object Detection: An Interaction, Guidance and Refinement Approach , 2022, IEEE Transactions on Multimedia.

[42]  Qiang Zhang,et al.  Employing Bilinear Fusion and Saliency Prior Information for RGB-D Salient Object Detection , 2022, IEEE Transactions on Multimedia.

[43]  Runmin Cong,et al.  Dynamic Selective Network for RGB-D Salient Object Detection , 2021, IEEE Transactions on Image Processing.

[44]  Guillaume Allibert,et al.  Modality-Guided Subnetwork for Salient Object Detection , 2021, 2021 International Conference on 3D Vision (3DV).

[45]  Nick Barnes,et al.  RGB-D Saliency Detection via Cascaded Mutual Information Minimization , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[46]  Xiaowu Chen,et al.  RGB-D Salient Object Detection With Ubiquitous Target Awareness , 2021, IEEE Transactions on Image Processing.

[47]  Deng-Ping Fan,et al.  Specificity-preserving RGB-D saliency detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[48]  Yun Xiao,et al.  TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network , 2021, ACM Multimedia.

[49]  Mofei Song,et al.  Disentangled High Quality Salient Object Detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[50]  Sam Kwong,et al.  Cross-modality Discrepant Interaction Network for RGB-D Salient Object Detection , 2021, ACM Multimedia.

[51]  Qijun Zhao,et al.  Depth Quality-Inspired Feature Manipulation for Efficient RGB-D Salient Object Detection , 2021, ACM Multimedia.

[52]  Jing Zhang,et al.  Exploring Depth Contribution for Camouflaged Object Detection , 2021, 2106.13217.

[53]  Qi Bi,et al.  Calibrated RGB-D Salient Object Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Sylvain Paris,et al.  Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Yuchao Dai,et al.  Uncertainty-aware Joint Salient Object and Camouflaged Object Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Vladlen Koltun,et al.  Vision Transformers for Dense Prediction , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[57]  Songyuan Li,et al.  Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Yuchao Dai,et al.  Simultaneously Localize, Segment and Rank the Camouflaged Objects , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Weisi Lin,et al.  Hierarchical Alternate Interaction Network for RGB-D Salient Object Detection , 2021, IEEE Transactions on Image Processing.

[60]  Haoqiang Fan,et al.  FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Yi Zhang,et al.  CDNet: Complementary Depth Network for RGB-D Salient Object Detection , 2021, IEEE Transactions on Image Processing.

[62]  Jungong Han,et al.  Revisiting Feature Fusion for RGB-T Salient Object Detection , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[63]  Zheng Lin,et al.  Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[64]  Linwei Ye,et al.  Cross-Modal Weighting Network for RGB-D Salient Object Detection , 2020, ECCV.

[65]  Ling Shao,et al.  BBS-Net: RGB-D Salient Object Detection with a Bifurcated Backbone Strategy Network , 2020, ECCV.

[66]  Ling Shao,et al.  Camouflaged Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Jiawei Zhao,et al.  Is Depth Really Necessary for Salient Object Detection? , 2020, ACM Multimedia.

[68]  Qijun Zhao,et al.  JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Yezhou Yang,et al.  Gated Channel Transformation for Visual Recognition , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Tian Xia,et al.  RGB-T Image Saliency Detection via Collaborative Graph Learning , 2019, IEEE Transactions on Multimedia.

[71]  Trung-Nghia Le,et al.  Anabranch network for camouflaged object segmentation , 2019, Comput. Vis. Image Underst..

[72]  Yicong Zhou,et al.  RGB-‘D’ Saliency Detection With Pseudo Depth , 2019, IEEE Transactions on Image Processing.

[73]  Russ Tedrake,et al.  A Supervised Approach to Predicting Noise in Depth Images , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[74]  Silvio Savarese,et al.  DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Kun Yu,et al.  DenseASPP for Semantic Segmentation in Street Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[76]  Hyemin Lee,et al.  Salient Region-Based Online Object Tracking , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[77]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[78]  Yunpeng Ma,et al.  A Unified RGB-T Saliency Detection Benchmark: Dataset, Baselines, Analysis and A Novel Approach , 2017, ArXiv.

[79]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[80]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[81]  Richard G. Baraniuk,et al.  From Denoising to Compressed Sensing , 2014, IEEE Transactions on Information Theory.

[82]  Xiao Han,et al.  Image Denoising Based on Mean Filter and Wavelet Transform , 2015, 2015 4th International Conference on Advanced Information Technology and Sensor Application (AITS).

[83]  Lothar Thiele,et al.  Reducing multi-hop calibration errors in large-scale mobile sensor networks , 2015, IPSN.

[84]  Ran Ju,et al.  Depth saliency based on anisotropic center-surround difference , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[85]  Rongrong Ji,et al.  RGBD Salient Object Detection: A Benchmark and Algorithms , 2014, ECCV.

[86]  Xueqing Li,et al.  Leveraging stereopsis for saliency analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[87]  Gerald Gerlach,et al.  Thermal Infrared Sensors: Theory, Optimisation and Practice , 2011 .