Feature Shrinkage Pyramid for Camouflaged Object Detection with Transformers

Vision transformers have recently shown strong global context modeling capabilities in camouflaged object detection. However, they suffer from two major limitations: less effective locality modeling and insufficient feature aggregation in decoders, which are not conducive to camouflaged object detection that explores subtle cues from indistinguishable backgrounds. To address these issues, in this paper, we propose a novel transformer-based Feature Shrinkage Pyramid Network (FSPNet), which aims to hierarchically decode locality-enhanced neighboring transformer features through progressive shrinking for camouflaged object detection. Specifically, we propose a nonlocal token enhancement module (NL-TEM) that employs the non-local mechanism to interact neighboring tokens and explore graph-based high-order relations within tokens to enhance local representations of transformers. Moreover, we design a feature shrinkage decoder (FSD) with adjacent interaction modules (AIM), which progressively aggregates adjacent transformer features through a layer-bylayer shrinkage pyramid to accumulate imperceptible but effective cues as much as possible for object information decoding. Extensive quantitative and qualitative experiments demonstrate that the proposed model significantly outperforms the existing 24 competitors on three challenging COD benchmark datasets under six widely-used evaluation metrics. Our code is publicly available at https://github.com/ZhouHuang23/FSPNet.

[1]  Huchuan Lu,et al.  PreyNet: Preying on Camouflaged Objects , 2022, ACM Multimedia.

[2]  Ge-Peng Ji,et al.  Camouflaged Object Detection via Context-Aware Cross-Level Fusion , 2022, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  L. Gool,et al.  OSFormer: One-Stage Camouflaged Instance Segmentation with Transformers , 2022, ECCV.

[4]  Shuo Wang,et al.  Trichomonas Vaginalis Segmentation in Microscope Images , 2022, MICCAI.

[5]  Chenglizhao Chen,et al.  Boundary-Guided Camouflaged Object Detection , 2022, IJCAI.

[6]  Dapeng Chen,et al.  I Can Find You! Boundary-Guided Separated Attention Network for Camouflaged Object Detection , 2022, AAAI.

[7]  Shuang Wu,et al.  Detecting Camouflaged Object in Frequency Domain , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Xin Fan,et al.  Segment, Magnify and Reiterate: Detecting Camouflaged Objects the Hard Way , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Wei Wu,et al.  Boosting Camouflaged Object Detection with Dual-Task Interactive Transformer , 2022, 2022 26th International Conference on Pattern Recognition (ICPR).

[10]  Ling Shao,et al.  High-resolution Iterative Feedback Network for Camouflaged Object Detection , 2022, AAAI.

[11]  M. Harandi,et al.  Implicit Motion Handling for Video Camouflaged Object Detection , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Huchuan Lu,et al.  Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection , 2022, 2203.02688.

[13]  N. Barnes,et al.  Modeling Aleatoric Uncertainty for Camouflaged Object Detection , 2022, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[14]  Ming-Hsuan Yang,et al.  Video Frame Interpolation Transformer , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Ming-Ming Cheng,et al.  Concealed Object Detection , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Fahad Shahbaz Khan,et al.  Transformers in Vision: A Survey , 2021, ACM Comput. Surv..

[17]  F. Yang,et al.  Uncertainty-Guided Transformer Reasoning for Camouflaged Object Detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Rynson W. H. Lau,et al.  Scene Context-Aware Salient Object Detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Huaixin Chen,et al.  Semantic-Guided Attention Refinement Network for Salient Object Detection in Optical Remote Sensing Images , 2021, Remote. Sens..

[20]  Tao Zhou,et al.  Context-aware Cross-level Fusion Network for Camouflaged Object Detection , 2021, IJCAI.

[21]  Changqun Xia,et al.  Pyramidal Feature Shrinking for Salient Object Detection , 2021, AAAI.

[22]  Shuo Zhang,et al.  Inferring Camouflaged Objects by Texture-Aware Interactive Guidance Network , 2021, AAAI.

[23]  Ling Shao,et al.  Visual Saliency Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Xiaopeng Wei,et al.  Camouflaged Object Segmentation with Distraction Mining , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Yuchao Dai,et al.  Uncertainty-aware Joint Salient Object and Camouflaged Object Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Chenglizhao Chen,et al.  Mutual Graph Learning for Camouflaged Object Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yuchao Dai,et al.  Simultaneously Localize, Segment and Rank the Camouflaged Objects , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Xiang Li,et al.  Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Tao Xiang,et al.  Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[31]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[32]  Yi Tay,et al.  Synthesizer: Rethinking Self-Attention for Transformer Models , 2020, ICML.

[33]  Tao Li,et al.  Structure-Measure: A New Way to Evaluate Foreground Maps , 2017, International Journal of Computer Vision.

[34]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Hailin Shi,et al.  Edge-aware Graph Representation Learning and Reasoning for Face Parsing , 2020, ECCV.

[36]  Lei Zhang,et al.  Suppress and Balance: A Simple Gated Network for Salient Object Detection , 2020, ECCV.

[37]  Ling Shao,et al.  PraNet: Parallel Reverse Attention Network for Polyp Segmentation , 2020, MICCAI.

[38]  Ling Shao,et al.  Camouflaged Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Jianhuang Lai,et al.  Interactive Two-Stream Decoder for Accurate and Fast Saliency Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Huchuan Lu,et al.  Multi-Scale Interactive Network for Salient Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[42]  Nick Barnes,et al.  UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Xin Yu,et al.  Weakly-Supervised Salient Object Detection via Scribble Annotations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Shuicheng Yan,et al.  Highly Efficient Salient Object Detection with 100K Parameters , 2020, ECCV.

[45]  Qingming Huang,et al.  F3Net: Fusion, Feedback and Focus for Salient Object Detection , 2019, AAAI.

[46]  Qingming Huang,et al.  Stacked Cross Refinement Network for Edge-Aware Salient Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[47]  Ming-Ming Cheng,et al.  EGNet: Edge Guidance Network for Salient Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[48]  Trung-Nghia Le,et al.  Anabranch network for camouflaged object segmentation , 2019, Comput. Vis. Image Underst..

[49]  Zhe Wu,et al.  Cascaded Partial Decoder for Fast and Accurate Salient Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Bo Ren,et al.  Enhanced-alignment Measure for Binary Foreground Map Evaluation , 2018, IJCAI.

[51]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Huchuan Lu,et al.  Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Huchuan Lu,et al.  Amulet: Aggregating Multi-level Convolutional Features for Salient Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[54]  B. D. Todd,et al.  Hiding in plain sight: a study on camouflage and habitat selection in a slow-moving desert herbivore , 2015 .

[55]  Lihi Zelnik-Manor,et al.  How to Evaluate Foreground Maps , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Sabine Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  P. Nagabhushan,et al.  Camouflage Defect Identification: A Novel Approach , 2006, 9th International Conference on Information Technology (ICIT'06).