OODformer: Out-Of-Distribution Detection Transformer

A serious problem in image classification is that a trained model might perform well for input data that originates from the same distribution as the data available for model training, but performs much worse for out-of-distribution (OOD) samples. In real-world safety-critical applications, in particular, it is important to be aware if a new data point is OOD. To date, OOD detection is typically addressed using either confidence scores, autoencoder based reconstruction, or contrastive learning. However, the global image context has not yet been explored to discriminate the non-local objectness between in-distribution and OOD samples. This paper proposes a first-of-its-kind OOD detection architecture named OODformer that leverages the contextualization capabilities of the transformer. Incorporating the transformer as the principal feature extractor allows us to exploit the object concepts and their discriminatory attributes along with their co-occurrence via visual attention. Based on contextualised embedding, we demonstrate OOD detection using both class-conditioned latent space similarity and a network confidence score. Our approach shows improved generalizability across various datasets. We have achieved a new state-of-the-art result on CIFAR-10/-100 and ImageNet30. Code is available at : https://github.com/rajatkoner08/oodformer.

[1]  Ev Zisselman,et al.  Deep Residual Flow for Out of Distribution Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Volker Tresp,et al.  Relation Transformer Network , 2020, ArXiv.

[3]  Geoffrey E. Hinton,et al.  Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[4]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[5]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[6]  Jasper Snoek,et al.  Likelihood Ratios for Out-of-Distribution Detection , 2019, NeurIPS.

[7]  Matthieu Guillaumin,et al.  Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.

[8]  Ching-Yao Chuang,et al.  Debiased Contrastive Learning , 2020, NeurIPS.

[9]  Kaiming He,et al.  Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[10]  Meng Yang,et al.  Large-Margin Softmax Loss for Convolutional Neural Networks , 2016, ICML.

[11]  Cewu Lu,et al.  Inverse-Transform AutoEncoder for Anomaly Detection , 2019, ArXiv.

[12]  Thomas G. Dietterich,et al.  Deep Anomaly Detection with Outlier Exposure , 2018, ICLR.

[13]  Elad Hoffer,et al.  Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.

[14]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[15]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[16]  Pushmeet Kohli,et al.  Contrastive Training for Improved Out-of-Distribution Detection , 2020, ArXiv.

[17]  Stanislav Pidhorskyi,et al.  Generative Probabilistic Novelty Detection with Adversarial Autoencoders , 2018, NeurIPS.

[18]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[19]  Yee Whye Teh,et al.  Do Deep Generative Models Know What They Don't Know? , 2018, ICLR.

[20]  Ran El-Yaniv,et al.  Deep Anomaly Detection Using Geometric Transformations , 2018, NeurIPS.

[21]  Julien Mairal,et al.  Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[23]  Yedid Hoshen,et al.  Classification-Based Anomaly Detection for General Data , 2020, ICLR.

[24]  Stephan Günnemann,et al.  Posterior Network: Uncertainty Estimation without OOD Samples via Density-Based Pseudo-Counts , 2020, NeurIPS.

[25]  Hongxia Jin,et al.  Generalized ODIN: Detecting Out-of-Distribution Image Without Learning From Out-of-Distribution Data , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[27]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[28]  Dawn Song,et al.  Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty , 2019, NeurIPS.

[29]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Bo Zong,et al.  Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection , 2018, ICLR.

[32]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[33]  Yedid Hoshen,et al.  PANDA: Adapting Pretrained Features for Anomaly Detection and Segmentation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[36]  Volker Tresp,et al.  Scenes and Surroundings: Scene Graph Generation using Relation Transformer , 2021, ArXiv.

[37]  Junnan Li,et al.  Prototypical Contrastive Learning of Unsupervised Representations , 2020, ICLR.

[38]  Ce Liu,et al.  Supervised Contrastive Learning , 2020, NeurIPS.

[39]  Jinwoo Shin,et al.  CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances , 2020, NeurIPS.

[40]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Prateek Mittal,et al.  SSD: A Unified Framework for Self-Supervised Outlier Detection , 2021, ICLR.

[42]  Leslie N. Smith,et al.  Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[43]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[44]  Hossein Mobahi,et al.  Large Margin Deep Networks for Classification , 2018, NeurIPS.

[45]  Stephan Günnemann,et al.  Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question Answering , 2021, SEMWEB.

[46]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[47]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[48]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[49]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[50]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[51]  Jordi Luque,et al.  Input complexity and out-of-distribution detection with likelihood-based generative models , 2020, ICLR.

[52]  Stephan Günnemann,et al.  Scene Graph Reasoning for Visual Question Answering , 2020, ArXiv.