Evaluating Robustness of AI Models against Adversarial Attacks

Recently developed adversarial attacks on neural networks have become more aggressive and dangerous, because of which Artificial Intelligence (AI) models are no longer sufficiently robust against them. It is important to have a set of effective and reliable methods to detect malicious attacks to ensure the security of AI models. Such standardized methods can also serve as a reference for researchers to develop robust models and new kinds of attacks. This study proposes a method to assess the robustness of AI models. Six commonly used image classification CNN models were evaluated when subjected to 13 types of adversarial attacks. The robustness of the models is calculated unbiased and can be used as a reference for further improvement. It is distinguished from prior related works that our algorithm is attack-agnostic and is applicable to neural network model.

[1]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[2]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[3]  Matthias Bethge,et al.  Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models , 2017, ArXiv.

[4]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[5]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[6]  Aleksander Madry,et al.  Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.

[7]  Patrick D. McDaniel,et al.  Adversarial Perturbations Against Deep Neural Networks for Malware Classification , 2016, ArXiv.

[8]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[9]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[10]  Uri Shaham,et al.  Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization , 2015, ArXiv.

[11]  Somesh Jha,et al.  Objective Metrics and Gradient Descent Algorithms for Adversarial Examples in Machine Learning , 2017, ACSAC.

[12]  Dina Katabi,et al.  ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation , 2019, ICML.

[13]  Terrance E. Boult,et al.  Adversarial Diversity and Hard Positive Generation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Philip H. S. Torr,et al.  On the Robustness of Semantic Segmentation Models to Adversarial Attacks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Quoc V. Le,et al.  Intriguing Properties of Adversarial Examples , 2017, ICLR.

[16]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[17]  David A. Wagner,et al.  Defensive Distillation is Not Robust to Adversarial Examples , 2016, ArXiv.

[18]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[19]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[20]  Facebook,et al.  Houdini : Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples , 2017 .

[21]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[22]  Amaury Habrard,et al.  Robustness and generalization for metric learning , 2012, Neurocomputing.

[23]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[24]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Robustness of classifiers: from adversarial to random noise , 2016, NIPS.

[25]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[26]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[27]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[28]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Jinfeng Yi,et al.  Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach , 2018, ICLR.

[30]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[31]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[32]  W. Brendel,et al.  Foolbox: A Python toolbox to benchmark the robustness of machine learning models , 2017 .