论文信息 - Adversarial Risk and the Dangers of Evaluating Against Weak Attacks

Adversarial Risk and the Dangers of Evaluating Against Weak Attacks

This paper investigates recently proposed approaches for defending against adversarial examples and evaluating adversarial robustness. The existence of adversarial examples in trained neural networks reflects the fact that expected risk alone does not capture the model's performance against worst-case inputs. We motivate the use of adversarial risk as an objective, although it cannot easily be computed exactly. We then frame commonly used attacks and evaluation metrics as defining a tractable surrogate objective to the true adversarial risk. This suggests that models may be obscured to adversaries, by optimizing this surrogate rather than the true adversarial risk. We demonstrate that this is a significant problem in practice by repurposing gradient-free optimization techniques into adversarial attacks, which we use to decrease the accuracy of several recently proposed defenses to near zero. Our hope is that our formulations and results will help researchers to develop more powerful defenses.

[1] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[2] J. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .

[3] Deborah Silver,et al. Feature Visualization , 1994, Scientific Visualization.

[4] J. L. Maryak,et al. Global random optimization by simultaneous perturbation stochastic approximation , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[5] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[6] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[7] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Jean-Yves Audibert,et al. Robust linear least squares regression , 2010, 1010.0074.

[9] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[11] Luca Rigazio,et al. Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13] G. Lugosi,et al. Empirical risk minimization for heavy-tailed losses , 2014, 1406.2462.

[14] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[15] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[16] Been Kim,et al. Interactive and interpretable machine learning models for human machine collaboration , 2015 .

[17] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).