Adversarial Attack Attribution: Discovering Attributable Signals in Adversarial ML Attacks

Machine Learning (ML) models are known to be vulnerable to adversarial inputs and researchers have demonstrated that even production systems, such as self-driving cars and MLas-a-service offerings, are susceptible. These systems represent a target for bad actors. Their disruption can cause real physical and economic harm. When attacks on production ML systems occur, the ability to attribute the attack to the responsible threat group is a critical step in formulating a response and holding the attackers accountable. We pose the following question: can adversarially perturbed inputs be attributed to the particular methods used to generate the attack? In other words, is there a way to find a signal in these attacks that exposes the attack algorithm, model architecture, or hyperparameters used in the attack? We introduce the concept of adversarial attack attribution and create a simple supervised learning experimental framework to examine the feasibility of discovering attributable signals in adversarial attacks. We find that it is possible to differentiate attacks generated with different attack algorithms, models, and hyperparameters on both the CIFAR-10 and MNIST datasets.

[1]  Qiang Li,et al.  Framework of Cyber Attack Attribution Based on Threat Intelligence , 2016, InterIoT/SaSeIoT.

[2]  Gregory N. Larsen,et al.  Techniques for Cyber Attack Attribution , 2003 .

[3]  Jun Zhu,et al.  Towards Robust Detection of Adversarial Examples , 2017, NeurIPS.

[4]  Patrick D. McDaniel,et al.  On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[5]  W. Brendel,et al.  Foolbox: A Python toolbox to benchmark the robustness of machine learning models , 2017 .

[6]  Michael I. Jordan,et al.  ML-LOO: Detecting Adversarial Examples with Feature Attribution , 2019, AAAI.

[7]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[8]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[9]  Ian J. Goodfellow,et al.  Technical Report on the CleverHans v2.1.0 Adversarial Examples Library , 2016 .

[10]  J. Zico Kolter,et al.  Adversarial Music: Real World Audio Adversary Against Wake-word Detection System , 2019, NeurIPS.

[11]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[12]  B. Buchanan,et al.  Attributing Cyber Attacks , 2015 .

[13]  Mario Fritz,et al.  Attributing Fake Images to GANs: Learning and Analyzing GAN Fingerprints , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Tracy Noel,et al.  National Security Certificate o Funded by Office of the Director of National Intelligence , 2013 .

[15]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[16]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[20]  Nikolaos Pitropakis,et al.  An Enhanced Cyber Attack Attribution Framework , 2018, TrustBus.

[21]  Patrick D. McDaniel,et al.  Extending Defensive Distillation , 2017, ArXiv.

[22]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[23]  Ajmal Mian,et al.  Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey , 2018, IEEE Access.

[24]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Debdeep Mukhopadhyay,et al.  Adversarial Attacks and Defences: A Survey , 2018, ArXiv.

[26]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[27]  Stephanie Forrest,et al.  Strategic aspects of cyberattack, attribution, and blame , 2017, Proceedings of the National Academy of Sciences.