Novel Adversarial Defense Techniques for White-Box Attacks

In this paper, we propose a novel machine learning technique to detect and correctly classify adversarially modified images. Our approach takes advantage of the adversarial training objective (to influence the prediction with small perturbations to the original image) by searching for large discrepancies in the model’s output relative to the input image. We find clusters in each space, and for an image whose clusters do not agree with each other, we flag the image as potentially adversarial and use the input space cluster’s prediction in place of the attacked model’s output. We have found that this method tends to find adversarial samples which are more likely to be misclassified by the attacked model, and that simply deferring to the input cluster’s prediction on such samples is enough to increase accuracy on adversarial data, even when the attacked model is adversarially trained.

[1]  S. Jha,et al.  A General Framework For Detecting Anomalous Inputs to DNN Classifiers , 2020, ICML.

[2]  Shutao Xia,et al.  Adversarial Weight Perturbation Helps Robust Generalization , 2020, NeurIPS.

[3]  Hang Su,et al.  Boosting Adversarial Training with Hypersphere Embedding , 2020, NeurIPS.

[4]  Arun Balaji Buduru,et al.  A Survey of Black-Box Adversarial Attacks on Computer Vision Models , 2019, 1912.01667.

[5]  Cho-Jui Hsieh,et al.  Convergence of Adversarial Training in Overparametrized Neural Networks , 2019, NeurIPS.

[6]  Ludwig Schmidt,et al.  Unlabeled Data Improves Adversarial Robustness , 2019, NeurIPS.

[7]  Bin Dong,et al.  You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle , 2019, NeurIPS.

[8]  Guoyin Wang,et al.  Generative Adversarial Network Training is a Continual Learning Problem , 2018, ArXiv.

[9]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[10]  Nicholas Carlini,et al.  On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses , 2018, ArXiv.

[11]  James Bailey,et al.  Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[12]  Alexandros G. Dimakis,et al.  The Robust Manifold Defense: Adversarial Training using Generative Models , 2017, ArXiv.

[13]  Alan L. Yuille,et al.  Mitigating adversarial effects through randomization , 2017, ICLR.

[14]  Valentina Zantedeschi,et al.  Efficient Defenses Against Adversarial Attacks , 2017, AISec@CCS.

[15]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[16]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[17]  Zhitao Gong,et al.  Adversarial and Clean Data Are Not Twins , 2017, aiDM@SIGMOD.

[18]  Ryan R. Curtin,et al.  Detecting Adversarial Samples from Artifacts , 2017, ArXiv.

[19]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[20]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[21]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[22]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[23]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[24]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[25]  Jianqin Zhou,et al.  On discrete cosine transform , 2011, ArXiv.

[26]  Tong Zhang An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..

[27]  Alex M. Andrew An Introduction to Support Vector Machines and Other Kernel‐based Learning Methods , 2001 .

[28]  J. Makhoul A fast cosine transform in one and two dimensions , 1980 .