论文信息 - A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks

A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks

Detecting test samples drawn sufficiently far away from the training distribution statistically or adversarially is a fundamental requirement for deploying a good classifier in many real-world machine learning applications. However, deep neural networks with the softmax classifier are known to produce highly overconfident posterior distributions even for such abnormal samples. In this paper, we propose a simple yet effective method for detecting any abnormal samples, which is applicable to any pre-trained softmax neural classifier. We obtain the class conditional Gaussian distributions with respect to (low- and upper-level) features of the deep models under Gaussian discriminant analysis, which result in a confidence score based on the Mahalanobis distance. While most prior methods have been evaluated for detecting either out-of-distribution or adversarial samples, but not both, the proposed method achieves the state-of-the-art performances for both cases in our experiments. Moreover, we found that our proposed method is more robust in harsh cases, e.g., when the training dataset has noisy labels or small number of samples. Finally, we show that the proposed method enjoys broader usage by applying it to class-incremental learning: whenever out-of-distribution samples are detected, our classification rule can incorporate new classes well without further training deep models.

[1] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[2] Tom Minka,et al. Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[4] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[5] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[6] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[7] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[8] Gabriela Csurka,et al. Distance-Based Image Classification: Generalizing to New Classes at Near-Zero Cost , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Yinda Zhang,et al. LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[10] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[11] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[12] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Seyed-Mohsen Moosavi-Dezfooli,et al. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Lujo Bauer,et al. Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition , 2016, CCS.

[15] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[16] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[17] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.

[18] Christoph H. Lampert,et al. iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Atul Prakash,et al. Robust Physical-World Attacks on Machine Learning Models , 2017, ArXiv.

[20] Ryan R. Curtin,et al. Detecting Adversarial Samples from Artifacts , 2017, ArXiv.

[21] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] R. Srikant,et al. Principled Detection of Out-of-Distribution Examples in Neural Networks , 2017, ArXiv.

[23] Kevin Gimpel,et al. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[24] Jinwoo Shin,et al. Confident Multiple Choice Learning , 2017, ICML.

[25] David Wagner,et al. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[26] Samy Bengio,et al. Adversarial examples in the physical world , 2016, ICLR.

[27] Zoubin Ghahramani,et al. Deep Bayesian Active Learning with Image Data , 2017, ICML.

[28] Frank Hutter,et al. A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets , 2017, ArXiv.

[29] Kibok Lee,et al. Hierarchical Novelty Detection for Visual Object Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30] Kibok Lee,et al. Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples , 2017, ICLR.

[31] James Bailey,et al. Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[32] Moustapha Cissé,et al. Countering Adversarial Images using Input Transformations , 2018, ICLR.