Myriad: Large Multimodal Model by Applying Vision Experts for Industrial Anomaly Detection

Existing industrial anomaly detection (IAD) methods predict anomaly scores for both anomaly detection and localization. However, they struggle to perform a multi-turn dialog and detailed descriptions for anomaly regions, e.g., color, shape, and categories of industrial anomalies. Recently, large multimodal (i.e., vision and language) models (LMMs) have shown eminent perception abilities on multiple vision tasks such as image captioning, visual understanding, visual reasoning, etc., making it a competitive potential choice for more comprehensible anomaly detection. However, the knowledge about anomaly detection is absent in existing general LMMs, while training a specific LMM for anomaly detection requires a tremendous amount of annotated data and massive computation resources. In this paper, we propose a novel large multi-modal model by applying vision experts for industrial anomaly detection (dubbed Myriad), which leads to definite anomaly detection and high-quality anomaly description. Specifically, we adopt MiniGPT-4 as the base LMM and design an Expert Perception module to embed the prior knowledge from vision experts as tokens which are intelligible to Large Language Models (LLMs). To compensate for the errors and confusions of vision experts, we introduce a domain adapter to bridge the visual representation gaps between generic and industrial images. Furthermore, we propose a Vision Expert Instructor, which enables the Q-Former to generate IAD domain vision-language tokens according to vision expert prior. Extensive experiments on MVTec-AD and VisA benchmarks demonstrate that our proposed method not only performs favorably against state-of-the-art methods under the 1-class and few-shot settings, but also provide definite anomaly prediction along with detailed descriptions in IAD domain.

[1]  Fanbin Lu,et al.  Removing Anomalies as Noises for Industrial Defect Localization , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Tao Dai,et al.  Unsupervised Surface Anomaly Detection with Diffusion Probabilistic Model , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Xingyu Li,et al.  AnoVL: Adapting Vision-Language Models for Unified Zero-shot Anomaly Localization , 2023, ArXiv.

[4]  Zhaopeng Gu,et al.  AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models , 2023, ArXiv.

[5]  Chongyang Zhang,et al.  Focus the Discrepancy: Intra- and Inter-Correlation Learning for Image Anomaly Detection , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Jifeng Dai,et al.  The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World , 2023, ArXiv.

[7]  Fan Wang,et al.  RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension , 2023, ArXiv.

[8]  Chin-Yew Lin,et al.  LafitE: Latent Diffusion Model with Feature Editing for Unsupervised Multi-class Anomaly Detection , 2023, ArXiv.

[9]  Feng Zhu,et al.  Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic , 2023, ArXiv.

[10]  Li Dong,et al.  Kosmos-2: Grounding Multimodal Large Language Models to the World , 2023, ArXiv.

[11]  Nguyen H. Tran,et al.  Revisiting Reverse Distillation for Anomaly Detection , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ying Zhao OmniAL: A Unified CNN Framework for Unsupervised Anomaly Localization , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Jiangning Zhang,et al.  A Zero-/Few-Shot Anomaly Classification and Segmentation Method for CVPR 2023 VAND Workshop Challenge Tracks 1&2: 1st Place on Zero-shot AD and 4th Place on Few-shot AD , 2023, arXiv.org.

[14]  Jiannan Wu,et al.  VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks , 2023, NeurIPS.

[15]  Kalyan Vasudev Alwala,et al.  ImageBind One Embedding Space to Bind Them All , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yuanhan Zhang,et al.  Otter: A Multi-Modal Model with In-Context Instruction Tuning , 2023, ArXiv.

[17]  Mohamed Elhoseiny,et al.  MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models , 2023, ICLR.

[18]  Zilei Wang,et al.  SimpleNet: A Simple Network for Image Anomaly Detection and Localization , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Avinash Ravichandran,et al.  WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Naman Goyal,et al.  LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.

[21]  S. Savarese,et al.  BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models , 2023, ICML.

[22]  Xi Li,et al.  DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Ledell Yu Wu,et al.  EVA: Exploring the Limits of Masked Visual Representation Learning at Scale , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  O. Dabeer,et al.  SPot-the-Difference Self-Supervised Pre-training for Anomaly Detection and Segmentation , 2022, ECCV.

[25]  Xin Lu,et al.  A Unified Model for Multi-class Anomaly Detection , 2022, NeurIPS.

[26]  Xingyu Li,et al.  Anomaly Detection via Reverse Distillation from One-Class Embedding , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Lu Yuan,et al.  RegionCLIP: Region-based Language-Image Pretraining , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Bernhard Kainz,et al.  Natural Synthetic Anomalies for Self-supervised Anomaly Detection and Localization , 2021, ECCV.

[29]  B. Schölkopf,et al.  Towards Total Recall in Industrial Anomaly Detection , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[31]  Romaric Audigier,et al.  PaDiM: a Patch Distribution Modeling Framework for Anomaly Detection and Localization , 2020, ICPR Workshops.

[32]  Yedid Hoshen,et al.  Sub-Image Anomaly Detection with Deep Pyramid Correspondences , 2020, ArXiv.

[33]  Carsten Steger,et al.  MVTec AD — A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[35]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[37]  Zuxuan Wu,et al.  DiffusionAD: Denoising Diffusion for Anomaly Detection , 2023, ArXiv.