Characterizing Adversarial Subspaces by Mutual Information

Deep learning is well-known for its great performances on images classification, object detection, and natural language processing. However, the recent research has demonstrated that visually indistinguishable images called adversarial examples can successfully fool neural networks by carefully crafting. In this paper, we design a detector named MID, calculating mutual information to characterize adversarial subspaces. Meanwhile, we use the defense framework called MagNet and mount the detector MID on it. Experimental results show that projected gradient descent (PGD), basic iterative method (BIM), Carlini and Wanger's attack (C&W attack) and elastic-net attack to deep neural network (elastic-net and L1 rules) can be effectively defended by our method.

[1]  Chia-Mu Yu,et al.  On the Limitation of MagNet Defense Against L1-Based Adversarial Examples , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W).

[2]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[3]  Aaron C. Courville,et al.  MINE: Mutual Information Neural Estimation , 2018, ArXiv.

[4]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[5]  Pin-Yu Chen,et al.  Attacking the Madry Defense Model with L1-based Adversarial Examples , 2017, ICLR.

[6]  Jinfeng Yi,et al.  EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples , 2017, AAAI.

[7]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[8]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[9]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[10]  Chia-Mu Yu,et al.  ON THE UTILITY OF CONDITIONAL GENERATION BASED MUTUAL INFORMATION FOR CHARACTERIZING ADVERSARIAL SUBSPACES , 2018, 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[11]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[12]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[13]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).