Detecting adversarial examples with inductive Venn-ABERS predictors

Inductive Venn-ABERS predictors (IVAPs) are a type of probabilistic predictors with the theoretical guarantee that their predictions are perfectly calibrated. We propose to exploit this calibration property for the detection of adversarial examples in binary classification tasks. By rejecting predictions if the uncertainty of the IVAP is too high, we obtain an algorithm that is both accurate on the original test set and significantly more robust to adversarial examples. The method appears to be competitive to the state of the art in adversarial defense, both in terms of robustness as well as scalability

[1]  Kouichi Sakurai,et al.  One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[2]  Nina Narodytska,et al.  Simple Black-Box Adversarial Perturbations for Deep Networks , 2016, ArXiv.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Somesh Jha,et al.  Objective Metrics and Gradient Descent Algorithms for Adversarial Examples in Machine Learning , 2017, ACSAC.

[5]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[6]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[8]  W. Brendel,et al.  Foolbox: A Python toolbox to benchmark the robustness of machine learning models , 2017 .

[9]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[10]  Vladimir Vovk,et al.  Large-scale probabilistic predictors with and without guarantees of validity , 2015, NIPS.

[11]  Jon Howell,et al.  Asirra: a CAPTCHA that exploits interest-aligned manual image categorization , 2007, CCS '07.

[12]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[13]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[14]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Aaron C. Courville,et al.  Deep Learning Vector Quantization , 2016, ESANN.

[16]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[17]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[18]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[19]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.