论文信息 - Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach - 字舞流文

Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach

The robustness of neural networks to adversarial examples has received great attention due to security implications. Despite various attack approaches to crafting visually imperceptible adversarial examples, little has been developed towards a comprehensive measure of robustness. In this paper, we provide a theoretical justification for converting robustness analysis into a local Lipschitz constant estimation problem, and propose to use the Extreme Value Theory for efficient evaluation. Our analysis yields a novel robustness metric called CLEVER, which is short for Cross Lipschitz Extreme Value for nEtwork Robustness. The proposed CLEVER score is attack-agnostic and computationally feasible for large neural networks. Experimental results on various networks, including ResNet, Inceptionv3 and MobileNet, show that (i) CLEVER is aligned with the robustness indication measured by the `2 and `∞ norms of adversarial examples from powerful attacks, and (ii) defended networks using defensive distillation or bounded ReLU indeed achieve better CLEVER scores. To the best of our knowledge, CLEVER is the first attack-independent robustness metric that can be applied to any neural network classifier.

Cho-Jui Hsieh | Pin-Yu Chen | Jinfeng Yi | Tsui-Wei Weng | Huan Zhang | D. Su | Yupeng Gao | L. Daniel | Huan Zhang

[1] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2] Dan Boneh,et al. Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[3] Jinfeng Yi,et al. EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples , 2017, AAAI.

[4] Valentina Zantedeschi,et al. Efficient Defenses Against Adversarial Attacks , 2017, AISec@CCS.

[5] Cho-Jui Hsieh,et al. Towards Robust Neural Networks via Random Self-ensemble , 2017, ECCV.

[6] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Matthias Hein,et al. Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation , 2017, NIPS.

[8] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Rüdiger Ehlers,et al. Formal Verification of Piece-Wise Linear Feed-Forward Neural Networks , 2017, ATVA.

[10] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[11] Atul Prakash,et al. Robust Physical-World Attacks on Machine Learning Models , 2017, ArXiv.

[12] L. Haan,et al. Extreme value theory : an introduction , 2006 .

[13] Mykel J. Kochenderfer,et al. Towards Proving the Adversarial Robustness of Deep Neural Networks , 2017, FVAV@iFM.

[14] Ananthram Swami,et al. Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[15] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.

[16] Jinfeng Yi,et al. Show-and-Fool: Crafting Adversarial Examples for Neural Image Captioning , 2017, ArXiv.

[17] Hao Chen,et al. MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[18] Ananthram Swami,et al. Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[19] Antonio Criminisi,et al. Measuring Neural Net Robustness with Constraints , 2016, NIPS.

[20] B. P. Zhang,et al. Estimation of the Lipschitz constant of a function , 1996, J. Glob. Optim..

[21] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[22] David Wagner,et al. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[23] Dawn Xiaodong Song,et al. Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[24] Jinfeng Yi,et al. ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[25] Yanjun Qi,et al. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[26] Mykel J. Kochenderfer,et al. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[27] Samy Bengio,et al. Adversarial examples in the physical world , 2016, ICLR.

[28] J. Zilinskas,et al. Analysis of different norms and corresponding Lipschitz constants for global optimization in multidi , 2007 .

[29] Min Wu,et al. Safety Verification of Deep Neural Networks , 2016, CAV.

[30] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[31] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[32] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[33] Samy Bengio,et al. Adversarial Machine Learning at Scale , 2016, ICLR.