论文信息 - Detecting Adversarial Examples and Other Misclassifications in Neural Networks by Introspection

Detecting Adversarial Examples and Other Misclassifications in Neural Networks by Introspection

Despite having excellent performances for a wide variety of tasks, modern neural networks are unable to provide a reliable confidence value allowing to detect misclassifications. This limitation is at the heart of what is known as an adversarial example, where the network provides a wrong prediction associated with a strong confidence to a slightly modified image. Moreover, this overconfidence issue has also been observed for regular errors and out-of-distribution data. We tackle this problem by what we call introspection, i.e. using the information provided by the logits of an already pretrained neural network. We show that by training a simple 3-layers neural network on top of the logit activations, we are able to detect misclassifications at a competitive level.

Marcin Detyniecki | Jonathan Aigrain | Marcin Detyniecki | Jonathan Aigrain

[1] Ian J. Goodfellow,et al. Technical Report on the CleverHans v2.1.0 Adversarial Examples Library , 2016 .

[2] Jason Yosinski,et al. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Michael I. Jordan,et al. HopSkipJumpAttack: A Query-Efficient Decision-Based Attack , 2019, 2020 IEEE Symposium on Security and Privacy (SP).

[4] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.

[5] Christoph Meinel,et al. Deep Learning for Medical Image Analysis , 2018, Journal of Pathology Informatics.

[6] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[7] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[8] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[9] Seyed-Mohsen Moosavi-Dezfooli,et al. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Xiaogang Wang,et al. Deep Learning Strong Parts for Pedestrian Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11] Kevin Gimpel,et al. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[12] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[13] Graham W. Taylor,et al. Learning Confidence for Out-of-Distribution Detection in Neural Networks , 2018, ArXiv.

[14] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.

[15] R. Srikant,et al. Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[16] Maya R. Gupta,et al. To Trust Or Not To Trust A Classifier , 2018, NeurIPS.