Using Intuition from Empirical Properties to Simplify Adversarial Training Defense

Due to the surprisingly good representation power of complex distributions, neural network (NN) classifiers are widely used in many tasks which include natural language processing, computer vision and cyber security. In recent works, people noticed the existence of adversarial examples. These adversarial examples break the NN classifiers' underlying assumption that the environment is attack free and can easily mislead fully trained NN classifier without noticeable changes. Among defensive methods, adversarial training is a popular choice. However, original adversarial training with single-step adversarial examples (Single-Adv) can not defend against iterative adversarial examples. Although adversarial training with iterative adversarial examples (Iter-Adv) can defend against iterative adversarial examples, it consumes too much computational power and hence is not scalable. In this paper, we analyze Iter-Adv techniques and identify two of their empirical properties. Based on these properties, we propose modifications which enhance Single-Adv to perform competitively as Iter-Adv. Through preliminary evaluation, we show that the proposed method enhances the test accuracy of state-of-the-art (SOTA) Single-Adv defensive method against iterative adversarial examples by up to 16.93% while reducing its training cost by 28.75%.

[1]  Harini Kannan,et al.  Adversarial Logit Pairing , 2018, NIPS 2018.

[2]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[3]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[4]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[5]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[6]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[7]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[8]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[9]  Sergei Izrailev,et al.  Machine Learning at Scale , 2014, ArXiv.

[10]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[11]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[12]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[13]  Kun He,et al.  Improving the Generalization of Adversarial Training with Domain Adaptation , 2018, ICLR.

[14]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[15]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.