AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models

Large Vision-Language Models (LVLMs) such as MiniGPT-4 and LLaVA have demonstrated the capability of understanding images and achieved remarkable performance in various visual tasks. Despite their strong abilities in recognizing common objects due to extensive training datasets, they lack specific domain knowledge and have a weaker understanding of localized details within objects, which hinders their effectiveness in the Industrial Anomaly Detection (IAD) task. On the other hand, most existing IAD methods only provide anomaly scores and necessitate the manual setting of thresholds to distinguish between normal and abnormal samples, which restricts their practical implementation. In this paper, we explore the utilization of LVLM to address the IAD problem and propose AnomalyGPT, a novel IAD approach based on LVLM. We generate training data by simulating anomalous images and producing corresponding textual descriptions for each image. We also employ an image decoder to provide fine-grained semantic and design a prompt learner to fine-tune the LVLM using prompt embeddings. Our AnomalyGPT eliminates the need for manual threshold adjustments, thus directly assesses the presence and locations of anomalies. Additionally, AnomalyGPT supports multi-turn dialogues and exhibits impressive few-shot in-context learning capabilities. With only one normal shot, AnomalyGPT achieves the state-of-the-art performance with an accuracy of 86.1%, an image-level AUC of 94.1%, and a pixel-level AUC of 95.3% on the MVTec-AD dataset. Code is available at https://github.com/CASIA-IVA-Lab/AnomalyGPT.

[1]  Ying Zhao OmniAL: A Unified CNN Framework for Unsupervised Anomaly Localization , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jiangning Zhang,et al.  A Zero-/Few-Shot Anomaly Classification and Segmentation Method for CVPR 2023 VAND Workshop Challenge Tracks 1&2: 1st Place on Zero-shot AD and 4th Place on Few-shot AD , 2023, arXiv.org.

[3]  Yan Wang,et al.  PandaGPT: One Model To Instruction-Follow Them All , 2023, TLLM.

[4]  Jiannan Wu,et al.  VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks , 2023, NeurIPS.

[5]  Kalyan Vasudev Alwala,et al.  ImageBind One Embedding Space to Bind Them All , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Mohamed Elhoseiny,et al.  MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models , 2023, ICLR.

[7]  Yong Jae Lee,et al.  Visual Instruction Tuning , 2023, NeurIPS.

[8]  Avinash Ravichandran,et al.  WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yue Wang,et al.  PyramidFlow: High-Resolution Defect Contrastive Localization Using Pyramid Normalizing Flow , 2023, Computer Vision and Pattern Recognition.

[10]  Naman Goyal,et al.  LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.

[11]  S. Savarese,et al.  BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models , 2023, ICML.

[12]  Yaochu Jin,et al.  Pushing the Limits of Fewshot Anomaly Detection in Industry Vision: Graphcore , 2023, International Conference on Learning Representations.

[13]  Andrew M. Dai,et al.  Scaling Instruction-Finetuned Language Models , 2022, ArXiv.

[14]  O. Dabeer,et al.  SPot-the-Difference Self-Supervised Pre-training for Anomaly Detection and Segmentation , 2022, ECCV.

[15]  Ying Zhao Just Noticeable Learning for Unsupervised Anomaly Localization and Detection , 2022, IEEE International Conference on Multimedia and Expo.

[16]  Michael W. Spratling,et al.  Registration based Few-Shot Anomaly Detection , 2022, ECCV.

[17]  Seunghyun Lee,et al.  CFA: Coupled-Hypersphere-Based Feature Adaptation for Target-Oriented Anomaly Localization , 2022, IEEE Access.

[18]  Xin Lu,et al.  A Unified Model for Multi-class Anomaly Detection , 2022, NeurIPS.

[19]  Chris G. Willcocks,et al.  AnoDDPM: Anomaly Detection with Denoising Diffusion Probabilistic Models using Simplex Noise , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[21]  Bernhard Kainz,et al.  Natural Synthetic Anomalies for Self-supervised Anomaly Detection and Localization , 2021, ECCV.

[22]  Kazuki Kozuka,et al.  CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows , 2021, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[23]  B. Schölkopf,et al.  Towards Total Recall in Industrial Anomaly Detection , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Pheng-Ann Heng,et al.  Learning Semantic Context from Normal Samples for Unsupervised Anomaly Detection , 2021, AAAI.

[25]  Jonathan Pirnay,et al.  Inpainting Transformer for Anomaly Detection , 2021, ICIAP.

[26]  Tomas Pfister,et al.  CutPaste: Self-Supervised Learning for Anomaly Detection and Localization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[28]  Romaric Audigier,et al.  PaDiM: a Patch Distribution Modeling Framework for Anomaly Detection and Localization , 2020, ICPR Workshops.

[29]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[30]  Matej Kristan,et al.  Reconstruction by inpainting for visual anomaly detection , 2020, Pattern Recognit..

[31]  Sungroh Yoon,et al.  Patch SVDD: Patch-level SVDD for Anomaly Detection and Segmentation , 2020, ACCV.

[32]  Yedid Hoshen,et al.  Sub-Image Anomaly Detection with Deep Pyramid Correspondences , 2020, ArXiv.

[33]  Carsten Steger,et al.  MVTec AD — A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Seyed-Ahmad Ahmadi,et al.  V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[37]  P. Pérez,et al.  Poisson image editing , 2003, ACM Trans. Graph..