论文信息 - Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation

Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation

Recent work has shown that state-of-the-art classifiers are quite brittle, in the sense that a small adversarial change of an originally with high confidence correctly classified input leads to a wrong classification again with high confidence. This raises concerns that such classifiers are vulnerable to attacks and calls into question their usage in safety-critical systems. We show in this paper for the first time formal guarantees on the robustness of a classifier by giving instance-specific lower bounds on the norm of the input manipulation required to change the classifier decision. Based on this analysis we propose the Cross-Lipschitz regularization functional. We show that using this form of regularization in kernel methods resp. neural networks improves the robustness of the classifier without any loss in prediction performance.

Matthias Hein | Maksym Andriushchenko | Matthias Hein | Maksym Andriushchenko

[1] Y. Le Cun,et al. Double backpropagation increasing generalization performance , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[2] Jürgen Schmidhuber,et al. Simplifying Neural Nets by Discovering Flat Minima , 1994, NIPS.

[3] Alexander J. Smola,et al. Learning with kernels , 1998 .

[4] Pedro M. Domingos,et al. Adversarial classification , 2004, KDD.

[5] Christopher Meek,et al. Adversarial learning , 2005, KDD '05.

[6] Johannes Stallkamp,et al. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition , 2012, Neural Networks.

[7] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[8] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[9] Philip M. Long,et al. On the inductive bias of dropout , 2014, J. Mach. Learn. Res..

[10] Luca Rigazio,et al. Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[11] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[12] Uri Shaham,et al. Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization , 2015, ArXiv.

[13] Dale Schuurmans,et al. Learning with a Strong Adversary , 2015, ArXiv.

[14] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[15] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Seyed-Mohsen Moosavi-Dezfooli,et al. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Antonio Criminisi,et al. Measuring Neural Net Robustness with Constraints , 2016, NIPS.

[18] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[19] Yang Song,et al. Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Moustapha Cissé,et al. Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[21] Seyed-Mohsen Moosavi-Dezfooli,et al. Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] David Wagner,et al. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[23] Dawn Xiaodong Song,et al. Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[24] Samy Bengio,et al. Adversarial examples in the physical world , 2016, ICLR.

[25] Dawn Xiaodong Song,et al. Adversarial Examples for Generative Models , 2017, 2018 IEEE Security and Privacy Workshops (SPW).