Improved Detection of Adversarial Attacks via Penetration Distortion Maximization

This paper is concerned with the defense of deep models against adversarial attacks. We developan adversarial detection method, which is inspired by the certificate defense approach, and capturesthe idea of separating class clusters in the embedding space to increase the margin. The resultingdefense is intuitive, effective, scalable, and can be integrated into any given neural classificationmodel. Our method demonstrates state-of-the-art (detection) performance under all threat models.

[1]  Masashi Sugiyama,et al.  Lipschitz-Margin Training: Scalable Certification of Perturbation Invariance for Deep Neural Networks , 2018, NeurIPS.

[2]  Holger Ulmer,et al.  Ensemble Methods as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2017, ArXiv.

[3]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[4]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[6]  Geoffrey E. Hinton,et al.  When Does Label Smoothing Help? , 2019, NeurIPS.

[7]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[8]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[9]  Dawn Xiaodong Song,et al.  Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong , 2017, ArXiv.

[10]  J. Zico Kolter,et al.  Scaling provable adversarial defenses , 2018, NeurIPS.

[11]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[12]  Ryan R. Curtin,et al.  Detecting Adversarial Samples from Artifacts , 2017, ArXiv.

[13]  Matthias Hein,et al.  Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation , 2017, NIPS.

[14]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[15]  Kamyar Azizzadenesheli,et al.  Stochastic Activation Pruning for Robust Adversarial Defense , 2018, ICLR.

[16]  Changshui Zhang,et al.  Deep Defense: Training DNNs with Improved Adversarial Robustness , 2018, NeurIPS.

[17]  Jan Hendrik Metzen,et al.  On Detecting Adversarial Perturbations , 2017, ICLR.

[18]  Cho-Jui Hsieh,et al.  RecurJac: An Efficient Recursive Algorithm for Bounding Jacobian Matrix of Neural Networks and Its Applications , 2018, AAAI.

[19]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[20]  Anil K. Jain,et al.  On-line signature verification, , 2002, Pattern Recognit..

[21]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[23]  Hossein Mobahi,et al.  Large Margin Deep Networks for Classification , 2018, NeurIPS.

[24]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[25]  Yang Song,et al.  PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[26]  Russ Tedrake,et al.  Evaluating Robustness of Neural Networks with Mixed Integer Programming , 2017, ICLR.

[27]  Ian J. Goodfellow,et al.  Technical Report on the CleverHans v2.1.0 Adversarial Examples Library , 2016 .

[28]  Ruitong Huang,et al.  Max-Margin Adversarial (MMA) Training: Direct Input Space Margin Maximization through Adversarial Training , 2018, ICLR.

[29]  Aleksander Madry,et al.  On Evaluating Adversarial Robustness , 2019, ArXiv.

[30]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[31]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[32]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[33]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[34]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[35]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Jun Zhu,et al.  Towards Robust Detection of Adversarial Examples , 2017, NeurIPS.

[37]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[38]  Xiaopeng Li,et al.  Exploiting the Inherent Limitation of L0 Adversarial Examples , 2019, RAID.

[39]  David A. Wagner,et al.  MagNet and "Efficient Defenses Against Adversarial Attacks" are Not Robust to Adversarial Examples , 2017, ArXiv.

[40]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[41]  V. Matousek,et al.  Signature verification using ART-2 neural network , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[42]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[43]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Logan Engstrom,et al.  Evaluating and Understanding the Robustness of Adversarial Logit Pairing , 2018, ArXiv.