AdaptiveClick: Clicks-aware Transformer with Adaptive Focal Loss for Interactive Image Segmentation

Interactive Image Segmentation (IIS) has emerged as a promising technique for decreasing annotation time. Substantial progress has been made in pre- and post-processing for IIS, but the critical issue of interaction ambiguity notably hindering segmentation quality, has been under-researched. To address this, we introduce AdaptiveClick -- a clicks-aware transformer incorporating an adaptive focal loss, which tackles annotation inconsistencies with tools for mask- and pixel-level ambiguity resolution. To the best of our knowledge, AdaptiveClick is the first transformer-based, mask-adaptive segmentation framework for IIS. The key ingredient of our method is the Clicks-aware Mask-adaptive Transformer Decoder (CAMD), which enhances the interaction between clicks and image features. Additionally, AdaptiveClick enables pixel-adaptive differentiation of hard and easy samples in the decision space, independent of their varying distributions. This is primarily achieved by optimizing a generalized Adaptive Focal Loss (AFL) with a theoretical guarantee, where two adaptive coefficients control the ratio of gradient values for hard and easy pixels. Our analysis reveals that the commonly used Focal and BCE losses can be considered special cases of the proposed AFL loss. With a plain ViT backbone, extensive experimental results on nine datasets demonstrate the superiority of AdaptiveClick compared to state-of-the-art methods. Code will be publicly available at https://github.com/lab206/AdaptiveClick.

[1]  Alexander Hermans,et al.  DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer , 2023, ArXiv.

[2]  Yong Jae Lee,et al.  Segment Everything Everywhere All at Once , 2023, ArXiv.

[3]  Liujuan Cao,et al.  InterFormer: Real-time Interactive Image Segmentation , 2023, ArXiv.

[4]  Ross B. Girshick,et al.  Segment Anything , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  J. Yong,et al.  Focused and Collaborative Feedback Integration for Interactive Image Segmentation , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Hengshuang Zhao,et al.  ScribbleSeg: Scribble-based Interactive Image Segmentation , 2023, ArXiv.

[7]  Qian Zhao,et al.  Interactive Segmentation as Gaussian Process Classification , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Fuhua Chen,et al.  Rethinking Click Embedding for Deep Interactive Image Segmentation , 2023, IEEE Transactions on Industrial Informatics.

[9]  Huchuan Lu,et al.  Bidirectional Relationship Inferring Network for Referring Image Localization and Segmentation , 2021, IEEE Transactions on Neural Networks and Learning Systems.

[10]  S. Ourselin,et al.  DeepEdit: Deep Editable Learning for Interactive Segmentation of 3D Medical Images , 2023, DALI@MICCAI.

[11]  M. Niethammer,et al.  SimpleClick: Interactive Image Segmentation with Simple Vision Transformers , 2022, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Huchuan Lu,et al.  Learning From Box Annotations for Referring Image Segmentation. , 2022, IEEE Transactions on Neural Networks and Learning Systems.

[13]  M. Niethammer,et al.  PseudoClick: Interactive Image Segmentation with Click Imitation , 2022, ECCV.

[14]  Chunle Guo,et al.  FocusCut: Diving into a Focus View in Interactive Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Mingxing Tan,et al.  PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions , 2022, ICLR.

[16]  Hengshuang Zhao,et al.  FocalClick: Towards Practical Interactive Image Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Ross B. Girshick,et al.  Exploring Plain Vision Transformer Backbones for Object Detection , 2022, ECCV.

[18]  Xudong Jiang,et al.  Deep Interactive Image Matting With Feature Propagation , 2022, IEEE Transactions on Image Processing.

[19]  Bo Li,et al.  Equalized Focal Loss for Dense Long-Tailed Object Detection , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  M. Niethammer,et al.  iSegFormer: Interactive Segmentation via Transformers with Application to 3D Knee MR Images , 2021, MICCAI.

[21]  A. Schwing,et al.  Masked-attention Mask Transformer for Universal Image Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Evgenii Zheltonozhskii,et al.  End-to-End Referring Video Object Segmentation with Multimodal Transformers , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Ross B. Girshick,et al.  Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Bjoern H Menze,et al.  The Medical Segmentation Decathlon , 2021, Nature Communications.

[25]  Konstantin Sofiiuk,et al.  Reviving Iterative Training with Mask Guidance for Interactive Segmentation , 2021, 2022 IEEE International Conference on Image Processing (ICIP).

[26]  Xianghua Xie,et al.  3D Interactive Segmentation With Semi-Implicit Representation and Active Learning , 2021, IEEE Transactions on Image Processing.

[27]  Yilei Zhang,et al.  Conditional Diffusion for Interactive Segmentation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Lutao Chu,et al.  EdgeFlow: Achieving Practical Interactive Segmentation with Edge-Guided Flow , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[29]  Alexander G. Schwing,et al.  Per-Pixel Classification is Not All You Need for Semantic Segmentation , 2021, NeurIPS.

[30]  Christos Davatzikos,et al.  The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification , 2021, ArXiv.

[31]  Xiangfeng Wang,et al.  Boundary-Aware Supervoxel-Level Iteratively Refined Interactive 3D Image Segmentation With Multi-Agent Reinforcement Learning , 2020, IEEE Transactions on Medical Imaging.

[32]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[33]  Jian Yang,et al.  Global Manifold Learning for Interactive Image Segmentation , 2021, IEEE Transactions on Multimedia.

[34]  Shruti Jadon,et al.  A survey of loss functions for semantic segmentation , 2020, 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[35]  Zhao Zhang,et al.  Interactive Image Segmentation With First Click Attention , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[37]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[38]  Brian L. Price,et al.  Getting to 99% Accuracy in Interactive Segmentation , 2020, ArXiv.

[39]  Ilia Petrov,et al.  F-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Michael Gygli,et al.  Continuous Adaptation for Interactive Object Segmentation by Learning from Corrections , 2019, ECCV.

[41]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Jiashi Feng,et al.  MultiSeg: Semantically Meaningful, Scale-Diverse Segmentations From Minimal User Input , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Konstantin Sofiiuk,et al.  AdaptIS: Adaptive Instance Selection Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Chang-Su Kim,et al.  Interactive Image Segmentation via Backpropagating Refinement Scheme , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Angela Yao,et al.  Content-Aware Multi-Level Guidance for Interactive Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Ross B. Girshick,et al.  LVIS: A Dataset for Large Vocabulary Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Rodrigo Benenson,et al.  Large-Scale Interactive Object Segmentation With Human Annotators , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Stefan Zachow,et al.  Automated segmentation of knee bone and cartilage combining statistical shape knowledge and convolutional neural networks: Data from the Osteoarthritis Initiative , 2019, Medical Image Anal..

[49]  Jian Yang,et al.  Probabilistic Diffusion for Interactive Image Segmentation , 2019, IEEE Transactions on Image Processing.

[50]  Sébastien Ourselin,et al.  DeepIGeoS: A Deep Interactive Geodesic Framework for Medical Image Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Zhuwen Li,et al.  Interactive Image Segmentation with Latent Diversity , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Luc Van Gool,et al.  Deep Extreme Cut: From Extreme Points to Object Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Sim Heng Ong,et al.  Regional Interactive Image Segmentation Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[54]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Yang Wang,et al.  Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation , 2016, ISVC.

[56]  Seyed-Ahmad Ahmadi,et al.  V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[57]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Ning Xu,et al.  Deep Interactive Object Selection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Meng Jian,et al.  Interactive Image Segmentation Using Adaptive Constraint Propagation , 2016, IEEE Transactions on Image Processing.

[60]  Saining Xie,et al.  Holistically-Nested Edge Detection , 2015, International Journal of Computer Vision.

[61]  Zhuowen Tu,et al.  MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[63]  Gerhard Stephan,et al.  Segmented anisotropic ssTEM dataset of neural tissue , 2013 .

[64]  Noel E. O'Connor,et al.  A comparative evaluation of interactive segmentation algorithms , 2010, Pattern Recognit..

[65]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[66]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[67]  Vasyl Pihur,et al.  Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach , 2007, Bioinform..

[68]  Liu Qing,et al.  Automated image segmentation using improved PCNN model based on cross-entropy , 2004, Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004..

[69]  Vladimir Kolmogorov,et al.  "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[70]  L. Maligranda,et al.  Chebyshev inequality in function spaces , 1991 .

[71]  W. Rudin Principles of mathematical analysis , 1964 .