论文信息 - Idealised Bayesian Neural Networks Cannot Have Adversarial Examples: Theoretical and Empirical Study

Idealised Bayesian Neural Networks Cannot Have Adversarial Examples: Theoretical and Empirical Study

We prove that idealised discriminative Bayesian neural networks, capturing perfect epistemic uncertainty, cannot have adversarial examples: Techniques for crafting adversarial examples will necessarily fail to generate perturbed images which fool the classifier. This suggests why MC dropout-based techniques have been observed to be fairly robust to adversarial examples. We support our claims mathematically and empirically. We experiment with HMC on synthetic data derived from MNIST for which we know the ground truth image density, showing that near-perfect epistemic uncertainty correlates to density under image manifold, and that adversarial images lie off the manifold. Using our new-found insights we suggest a new attack for MC dropout-based models by looking for imperfections in uncertainty estimation, and also suggest a mitigation. Lastly, we demonstrate our mitigation on a cats-vs-dogs image classification task with a VGG13 variant.

Yarin Gal | Lewis Smith

[1] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[2] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .

[3] Yarin Gal,et al. Uncertainty in Deep Learning , 2016 .

[4] Pascal Frossard,et al. Analysis of classifiers’ robustness to adversarial perturbations , 2015, Machine Learning.

[5] Martin Wistuba,et al. Adversarial Phenomenon in the Eyes of Bayesian Deep Learning , 2017, ArXiv.

[6] Jun Zhu,et al. Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[8] Jon Howell,et al. Asirra: a CAPTCHA that exploits interest-aligned manual image categorization , 2007, CCS '07.

[9] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[10] Alex Kendall,et al. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[11] Yingzhen Li,et al. Are Generative Classifiers More Robust to Adversarial Attacks? , 2018, ICML.