论文信息 - Lightweight Detection of Out-of-Distribution and Adversarial Samples via Channel Mean Discrepancy

Lightweight Detection of Out-of-Distribution and Adversarial Samples via Channel Mean Discrepancy

Detecting out-of-distribution (OOD) and adversarial samples is essential when deploying classification models in real-world applications. We introduce Channel Mean Discrepancy (CMD), a model-agnostic distance metric for evaluating the statistics of features extracted by classification models, inspired by integral probability metrics. CMD compares the feature statistics of incoming samples against feature statistics estimated from previously seen training samples with minimal overhead. We experimentally demonstrate that CMD magnitude is significantly smaller for legitimate samples than for OOD and adversarial samples. We propose a simple method to reliably differentiate between legitimate samples from OOD and adversarial samples using CMD, requiring only a single forward pass on a pre-trained classification model per sample. We further demonstrate how to achieve single image detection by using a lightweight model for channel sensitivity tuning, an improvement on other statistical detection methods. Preliminary results show that our simple yet effective method outperforms several state-of-the-art approaches to detecting OOD and adversarial samples across various datasets and attack methods with high efficiency and generalizability.

Junfeng Guo | H. T. Kung | Xin Dong | Wei-Te Ting

[1] James Bailey,et al. Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[2] Hongxia Jin,et al. Generalized ODIN: Detecting Out-of-Distribution Image Without Learning From Out-of-Distribution Data , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Kevin Gimpel,et al. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[4] Graham W. Taylor,et al. Learning Confidence for Out-of-Distribution Detection in Neural Networks , 2018, ArXiv.

[5] Xin Dong,et al. Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon , 2017, NIPS.

[6] Kristin L. Sainani,et al. Logistic Regression , 2014, PM & R : the journal of injury, function, and rehabilitation.

[7] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[8] Zhitao Gong,et al. Adversarial and Clean Data Are Not Twins , 2017, aiDM@SIGMOD.

[9] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Samy Bengio,et al. Adversarial examples in the physical world , 2016, ICLR.

[11] Seyed-Mohsen Moosavi-Dezfooli,et al. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] R. Srikant,et al. Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[13] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .

[14] Feng Liu,et al. Learning Deep Kernels for Non-Parametric Two-Sample Tests , 2020, ICML.

[15] A. Odén,et al. Arguments for Fisher's Permutation Test , 1975 .

[16] Kibok Lee,et al. Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples , 2017, ICLR.

[17] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[18] Arthur Gretton,et al. Demystifying MMD GANs , 2018, ICLR.

[19] Jun Zhu,et al. Towards Robust Detection of Adversarial Examples , 2017, NeurIPS.

[20] Joost R. van Amersfoort,et al. Simple and Scalable Epistemic Uncertainty Estimation Using a Single Deep Deterministic Neural Network , 2020, ICML 2020.

[21] Andre Araujo,et al. Computing Receptive Fields of Convolutional Neural Networks , 2019, Distill.

[22] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[23] Xin Li,et al. Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[24] Léon Bottou,et al. Wasserstein Generative Adversarial Networks , 2017, ICML.

[25] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[27] Marius Kloft,et al. Two-sample Testing Using Deep Learning , 2020, AISTATS.

[28] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[29] A. Müller. Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[30] Kibok Lee,et al. A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[31] David Wagner,et al. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[32] Patrick D. McDaniel,et al. On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[33] A. Guillin,et al. On the rate of convergence in Wasserstein distance of the empirical measure , 2013, 1312.2128.