Segment Any Anomaly without Training via Hybrid Prompt Regularization

We present a novel framework, i.e., Segment Any Anomaly + (SAA+), for zero-shot anomaly segmentation with hybrid prompt regularization to improve the adaptability of modern foundation models. Existing anomaly segmentation models typically rely on domain-specific fine-tuning, limiting their generalization across countless anomaly patterns. In this work, inspired by the great zero-shot generalization ability of foundation models like Segment Anything, we first explore their assembly to leverage diverse multi-modal prior knowledge for anomaly localization. For non-parameter foundation model adaptation to anomaly segmentation, we further introduce hybrid prompts derived from domain expert knowledge and target image context as regularization. Our proposed SAA+ model achieves state-of-the-art performance on several anomaly segmentation benchmarks, including VisA, MVTec-AD, MTD, and KSDD2, in the zero-shot setting. We will release the code at \href{https://github.com/caoyunkang/Segment-Any-Anomaly}{https://github.com/caoyunkang/Segment-Any-Anomaly}.

[1]  A. Vedaldi,et al.  What does CLIP know about a red circle? Visual prompt engineering for VLMs , 2023, ArXiv.

[2]  Avinash Ravichandran,et al.  WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Weiming Shen,et al.  Complementary Pseudo Multimodal Feature for Point Cloud Anomaly Detection , 2023, ArXiv.

[4]  Jun-Juan Zhu,et al.  Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection , 2023, ECCV.

[5]  Jiangning Zhang,et al.  Multimodal Industrial Anomaly Detection via Hybrid Fusion , 2023, ArXiv.

[6]  Xinyu Li,et al.  Unsupervised Image Anomaly Detection and Segmentation Based on Pretrained Feature Mapping , 2023, IEEE Transactions on Industrial Informatics.

[7]  M. Irani,et al.  Teaching CLIP to Count to Ten , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Yunkang Cao,et al.  Collaborative Discrepancy Optimization for Reliable Image Anomaly Localization , 2023, IEEE Transactions on Industrial Informatics.

[9]  Jielin Jiang,et al.  Masked Swin Transformer Unet for Industrial Anomaly Detection , 2023, IEEE Transactions on Industrial Informatics.

[10]  Takayuki Okatani,et al.  Zero-shot versus Many-shot: Unsupervised Texture Anomaly Detection , 2023, 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[11]  Trevor Darrell,et al.  Multitask Vision-Language Prompt Tuning , 2022, IEEE Workshop/Winter Conference on Applications of Computer Vision.

[12]  Chen Change Loy,et al.  Unified Vision and Language Prompt Learning , 2022, ArXiv.

[13]  Xinyu Li,et al.  Position Encoding Enhanced Feature Mapping for Image Anomaly Detection , 2022, 2022 IEEE 18th International Conference on Automation Science and Engineering (CASE).

[14]  O. Dabeer,et al.  SPot-the-Difference Self-Supervised Pre-training for Anomaly Detection and Segmentation , 2022, ECCV.

[15]  B. Raj,et al.  R^2VOS: Robust Referring Video Object Segmentation via Relational Multimodal Cycle Consistency , 2022, ArXiv.

[16]  Xiang Ming,et al.  Towards Robust Video Object Segmentation with Adaptive Object Calibration , 2022, ACM Multimedia.

[17]  Aniruddha Kembhavi,et al.  Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks , 2022, ICLR.

[18]  Liang Gao,et al.  Industrial Image Anomaly Localization Based on Gaussian Clustering of Pretrained Feature , 2022, IEEE Transactions on Industrial Electronics.

[19]  Yifeng Zhang,et al.  Semi-supervised Knowledge Distillation for Tiny Defect Detection , 2022, 2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD).

[20]  Liang Gao,et al.  Informative knowledge distillation for image anomaly segmentation , 2022, Knowl. Based Syst..

[21]  Phillip Isola,et al.  Exploring Visual Prompts for Adapting Large-Scale Models , 2022, 2203.17274.

[22]  Serge J. Belongie,et al.  Visual Prompt Tuning , 2022, ECCV.

[23]  Chen Change Loy,et al.  Conditional Prompt Learning for Vision-Language Models , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  L. Czúni,et al.  Zero-shot learning and classification of steel surface defects , 2022, Fourteenth International Conference on Machine Vision (ICMV 2021).

[25]  Jingren Zhou,et al.  OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework , 2022, ICML.

[26]  Xingyu Li,et al.  Anomaly Detection via Reverse Distillation from One-Class Embedding , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Alexander S. Ecker,et al.  Image Segmentation Using Text and Image Prompts , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Junnan Li,et al.  Align and Prompt: Video-and-Language Pre-training with Entity Prompts , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Lu Yuan,et al.  RegionCLIP: Region-based Language-Image Pretraining , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Weidi Xie,et al.  Prompting Visual-Language Models for Efficient Video Understanding , 2021, ECCV.

[31]  Liunian Harold Li,et al.  Grounded Language-Image Pre-training , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Xiao Li,et al.  Reliable Propagation-Correction Modulation for Video Object Segmentation , 2021, AAAI.

[33]  Jiwen Lu,et al.  DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Chen Change Loy,et al.  Extract Free Dense Labels from CLIP , 2021, ECCV.

[35]  Chen Change Loy,et al.  Learning to Prompt for Vision-Language Models , 2021, International Journal of Computer Vision.

[36]  B. Schölkopf,et al.  Towards Total Recall in Industrial Anomaly Detection , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Fabrizio Falchi,et al.  MOCCA: Multilayer One-Class Classification for Anomaly Detection , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Adil Khan,et al.  Anomaly Detection Based on Zero-Shot Outlier Synthesis and Hierarchical Feature Distillation , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[39]  Takashi Matsubara,et al.  Deep Generative Model Using Unregularized Score for Anomaly Detection With Heterogeneous Complexity , 2018, IEEE Transactions on Cybernetics.

[40]  Y. Liu,et al.  SoftPatch: Unsupervised Anomaly Detection with Noisy Data , 2022, NeurIPS.

[41]  Jenia Jitsev,et al.  LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs , 2021, ArXiv.

[42]  D. Skočaj,et al.  DRÆM – A discriminatively trained reconstruction embedding for surface anomaly detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Michael S. Bernstein,et al.  On the Opportunities and Risks of Foundation Models , 2021, ArXiv.

[44]  Shiliang Pu,et al.  Divide-and-Assemble: Learning Block-wise Memory for Unsupervised Anomaly Detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[45]  Junnan Li,et al.  Align before Fuse: Vision and Language Representation Learning with Momentum Distillation , 2021, NeurIPS.

[46]  Pheng-Ann Heng,et al.  Learning Semantic Context from Normal Samples for Unsupervised Anomaly Detection , 2021, AAAI.

[47]  Danijel Skocaj,et al.  Mixed supervision for surface-defect detection: from weakly to fully supervised learning , 2021, Comput. Ind..

[48]  Errui Ding,et al.  Student-Teacher Feature Pyramid Matching for Anomaly Detection , 2021, BMVC.

[49]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[50]  Hamid R. Rabiee,et al.  Multiresolution Knowledge Distillation for Anomaly Detection , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Chun-Liang Li,et al.  Learning and Evaluating Representations for Deep One-class Classification , 2020, ICLR.

[52]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[53]  Nassir Navab,et al.  Autoencoders for Unsupervised Anomaly Segmentation in Brain MR Images: A Comparative Study , 2020, Medical Image Anal..

[54]  Jun Cheng,et al.  Encoding Structure-Texture Relation with P-Net for Anomaly Detection in Retinal Images , 2020, ECCV.

[55]  Sungroh Yoon,et al.  Patch SVDD: Patch-level SVDD for Anomaly Detection and Segmentation , 2020, ACCV.

[56]  Paul Bergmann,et al.  Uninformed Students: Student-Teacher Anomaly Detection With Discriminative Latent Embeddings , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Carsten Steger,et al.  MVTec AD — A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Yibin Huang,et al.  Surface defect saliency of magnetic tile , 2018, The Visual Computer.

[59]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[60]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.